历史文章:
1、Python底层实现KNN:https://blog.csdn.net/cccccyyyyy12345678/article/details/117911220
2、Python底层实现决策树:https://blog.csdn.net/cccccyyyyy12345678/article/details/118389088
3、Python底层实现贝叶斯:https://blog.csdn.net/cccccyyyyy12345678/article/details/118411638
4、Python线性回归:https://blog.csdn.net/cccccyyyyy12345678/article/details/118486796
前言
逻辑回归虽然叫回归,但是它是分类模型。将回归转化为分类的关键是sigmoid函数。
1、导入数据
def read_xlsx(path):
data = pd.read_excel(path)
print(data)
return data
2、归一化
因为逻辑回归也用到梯度下降,为了增加梯度下降速度,需要对数据进行消除量纲处理。
def MinMaxScaler(data):
col = data.shape[1]
for i in range(0, col-1):
arr = data.iloc[:, i]
arr = np.array(arr)
min = np.min(arr)
max = np.max(arr)
arr = (arr-min)/(max-min)
data.iloc[:, i] = arr
return data
3、划分训练集和测试集
def train_test_split(data, test_size=0.2, random_state=None):
col = data.shape[1]
x = data.iloc[:, 0:col-1]
y = data.iloc[:, -1]
x = np.array(x)
y = np.array(y)
# 设置随机种子,当随机种子非空时,将锁定随机数
if random_state:
np.random.seed(random_state)
# 将样本集的索引值进行随机打乱
# permutation随机生成0-len(data)随机序列
shuffle_indexs = np.random.permutation(len(x))
# 提取位于样本集中20%的那个索引值
test_size = int(len(x) * test_size)
# 将随机打乱的20%的索引值赋值给测试索引
test_indexs = shuffle_indexs[:test_size]
# 将随机打乱的80%的索引值赋值给训练索引
train_indexs = shuffle_indexs[test_size:]
# 根据索引提取训练集和测试集
x_train = x[train_indexs]
y_train = y[train_indexs]
x_test = x[test_indexs]
y_test = y[test_indexs]
# 将切分好的数据集返回出去
# print(y_train)
return x_train, x_test, y_train, y_test
4、定义sigmoid函数
def sigmoid(x, theta):
# 线性回归模型,中间模型,np.dot为向量点积
z = np.dot(x, theta)
h = 1/(1 + np.exp(-z))
return h
5、定义损失函数
这一步很巧妙的根据sigmoid函数特性定义出损失函数。
def costFunction(h, y):
m = len(h)
J = -1/m * np.sum(y * np.log(h) + (1-y) * np.log(1-h))
return J
6、梯度下降
def gradeDesc(x,y,alpha=0.01,iter_num=2000):
m = x.shape[0]
n = x.shape[1]
xMatrix = np.mat(x)
yMatrix = np.mat(y).transpose()
J_history = np.zeros(iter_num) # 初始化J_history, np.zero生成1行iter_num列都是0的矩阵
theta = np.ones((n, 1)) # 初始化theta, np.zero生成n行1列都是0的矩阵
# 执行梯度下降
for i in range(iter_num):
h = sigmoid(xMatrix, theta) # sigmoid 函数
J_history[i] = costFunction(h, y)
theta = theta + alpha * xMatrix.transpose() * (yMatrix - h) # 梯度
return J_history, theta
7、计算准确率
def score(h, y):
m = len(h)
# 定义计数变量
count = 0
for i in range(m):
if np.where(h[i] >= 0.5, 1, 0) == y[i]:
count += 1
accuracy = count/m
print("Accuracy:", accuracy)
return accuracy
总结
线性回归的输出作为逻辑回归sigmoid函数的输入