機器學習作業-Logistic Regression(邏輯迴歸)
阿新 • • 發佈:2018-12-10
ML課堂的第二個作業,邏輯迴歸要求如下:
資料集連結如下:
邏輯迴歸的關鍵是運用了sigmod函式,sigmod函式有一個很好的性質是其導函式很好求
函式影象:
sigmod會將函式值對映到(0,1)區間內,將其輸出值看作是概率則有邏輯迴歸的二分類模型:
上式很好理解,sigmod(x)是x屬於pos類的概率則x屬於neg類的概率自然就是1-sigmod(x),兩式子組合一下可得到下式:
上式其實就是一個概率密度函式,對θ做最大似然估計,使最大似然函式求到最大值。
用梯度下降法,梯度和更新公式如下:
題目要求用GD、SGD還有牛頓法目前實現了GD和SGD的,牛頓法會在後面更新,需要提到的是資料需要進行歸一化,不然由於計算機內浮點數精度原因sigmod函式會取到1,log(sigmod)會出log(0)錯,後面用tensorflow實現的版本用adam優化在不進行歸一化的情況下可以收斂。
首先是梯度下降的程式碼
nolinear.py裡面封裝了一下sigmod函式
import math
def sigmods(x):
return 1/(math.exp(-x)+1)
然後是主檔案:
import nolinear as nl import numpy as np import matplotlib.pyplot as plt import math data_x = np.loadtxt("ex4Data/ex4x.dat") data_y = np.loadtxt("ex4Data/ex4y.dat") plt.axis([15, 65, 40, 90]) plt.xlabel("exam 1 score") plt.ylabel("exam 2 score") for i in range(data_y.size): if data_y[i] == 1: plt.plot(data_x[i][0], data_x[i][1], 'b+') else: plt.plot(data_x[i][0], data_x[i][1], 'bo') mean = data_x.mean(axis=0) variance = data_x.std(axis=0) data_x = (data_x-mean)/variance data_y = data_y.reshape(-1, 1) # 拼接 temp = np.ones(data_y.size) data_x = np.c_[data_x, temp] learn_rate = 0.1 theda = np.zeros([3]) loss = 0 old_loss = 0 for i in range(data_y.size): if data_y[i] == 1: loss += math.log10(nl.sigmods(np.matmul(data_x[i], theda))) else: loss += math.log10(1-nl.sigmods(np.matmul(data_x[i], theda))) while abs(loss-old_loss) > 0.001: temp = np.matmul(data_x, theda) dew = np.zeros([3]) for i in range(data_y.size): dew += (data_y[i]-nl.sigmods(temp[i]))*data_x[i] theda = theda+learn_rate*dew old_loss = loss loss = 0 for i in range(data_y.size): if data_y[i] == 1: loss += math.log10(nl.sigmods(np.matmul(data_x[i], theda))) else: loss += math.log10(1 - nl.sigmods(np.matmul(data_x[i], theda))) print(-old_loss) plot_y = np.zeros(65-16) plot_x = np.arange(16, 65) for i in range(16, 65): plot_y[i-16] = -(theda[2]+theda[0]*((i-mean[0])/variance[0]))/theda[1] plot_y[i - 16] = plot_y[i-16]*variance[1]+mean[1] plt.plot(plot_x, plot_y) plt.show()
最後得到結果:
loss函式的值可以看到幾步就收斂了,跟同學比有點過快,現在還沒搞明白原因
接著給出SGD的程式碼,SGD每次隨機抽取兩個樣本進行梯度下降
import nolinear as nl import numpy as np import matplotlib.pyplot as plt import math import random data_x = np.loadtxt("ex4Data/ex4x.dat") data_y = np.loadtxt("ex4Data/ex4y.dat") plt.axis([15, 65, 40, 90]) plt.xlabel("exam 1 score") plt.ylabel("exam 2 score") for i in range(data_y.size): if data_y[i] == 1: plt.plot(data_x[i][0], data_x[i][1], 'b+') else: plt.plot(data_x[i][0], data_x[i][1], 'bo') mean = data_x.mean(axis=0) variance = data_x.std(axis=0) data_x = (data_x-mean)/variance data_y = data_y.reshape(-1, 1) # 拼接 temp = np.ones(data_y.size) data_x = np.c_[data_x, temp] learn_rate = 0.1 theda = np.zeros([3]) loss = 0 old_loss = 0 for i in range(data_y.size): if data_y[i] == 1: loss += math.log10(nl.sigmods(np.matmul(data_x[i], theda))) else: loss += math.log10(1-nl.sigmods(np.matmul(data_x[i], theda))) while abs(loss-old_loss) > 0.001: temp = np.matmul(data_x, theda) dew = np.zeros([3]) j = random.randint(0, data_y.size-1) dew += (data_y[j]-nl.sigmods(temp[j]))*data_x[j] z = random.randint(0, data_y.size - 1) while j == z: z = random.randint(0, data_y.size - 1) dew += (data_y[z] - nl.sigmods(temp[z])) * data_x[z] theda = theda+learn_rate*dew old_loss = loss loss = 0 for i in range(data_y.size): if data_y[i] == 1: loss += math.log10(nl.sigmods(np.matmul(data_x[i], theda))) else: loss += math.log10(1 - nl.sigmods(np.matmul(data_x[i], theda))) print(-old_loss) plot_y = np.zeros(65-16) plot_x = np.arange(16, 65) for i in range(16, 65): plot_y[i-16] = -(theda[2]+theda[0]*((i-mean[0])/variance[0]))/theda[1] plot_y[i - 16] = plot_y[i-16]*variance[1]+mean[1] plt.plot(plot_x, plot_y) plt.show()
每次執行sgd其結果都不一定一樣,但是loss值都是收斂於14左右,執行結果圖示:
第二次執行
loss的變化:
tensorflow實現版
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
data_x = np.loadtxt("ex4Data/ex4x.dat")
data_y = np.loadtxt("ex4Data/ex4y.dat")
plt.axis([15, 65, 40, 90])
plt.xlabel("exam 1 score")
plt.ylabel("exam 2 score")
for i in range(data_y.size):
if data_y[i] == 1:
plt.plot(data_x[i][0], data_x[i][1], 'b+')
else:
plt.plot(data_x[i][0], data_x[i][1], 'bo')
data_y = data_y.reshape(-1, 1)
x = tf.placeholder("float", [None, 2])
y = tf.placeholder("float", [None, 1])
w = tf.Variable(tf.zeros([2, 1]))
bias = tf.Variable(tf.zeros([1, 1]))
z = tf.matmul(x, w)+bias
xita = w
b = bias
loss = tf.reduce_sum(y*tf.log(tf.sigmoid(z))+(1-y)*tf.log(1-tf.sigmoid(z)))
tf.summary.scalar("loss_function", -loss)
train_opt = tf.train.AdamOptimizer(0.1).minimize(-loss)
merge = tf.summary.merge_all()
init = tf.global_variables_initializer()
summary_writer = tf.summary.FileWriter("log", tf.get_default_graph())
sess = tf.Session()
sess.run(init)
for i in range(1000):
train, loss_value, w_value, b_value, summary = sess.run([train_opt, loss, xita, b, merge], feed_dict={x: data_x, y: data_y})
summary_writer.add_summary(summary, i)
print(loss_value)
w_value = np.array(w_value)
w_value = w_value.reshape(-1)
b_value = np.array(b_value)
plot_y = np.zeros(65 - 16)
plot_x = np.arange(16, 65)
for j in range(16, 65):
plot_y[j - 16] = -(b_value[0] + w_value[0] * j) / w_value[1]
plt.plot(plot_x, plot_y)
plt.show()
summary_writer.close()
結果圖:
牛頓法將在後面更新……