1. 程式人生 > >機器學習作業-Logistic Regression(邏輯迴歸)

機器學習作業-Logistic Regression(邏輯迴歸)

ML課堂的第二個作業,邏輯迴歸要求如下:

資料集連結如下:

邏輯迴歸的關鍵是運用了sigmod函式,sigmod函式有一個很好的性質是其導函式很好求

函式影象:

sigmod會將函式值對映到(0,1)區間內,將其輸出值看作是概率則有邏輯迴歸的二分類模型:

上式很好理解,sigmod(x)是x屬於pos類的概率則x屬於neg類的概率自然就是1-sigmod(x),兩式子組合一下可得到下式:

上式其實就是一個概率密度函式,對θ做最大似然估計,使最大似然函式求到最大值。

用梯度下降法,梯度和更新公式如下:

題目要求用GD、SGD還有牛頓法目前實現了GD和SGD的,牛頓法會在後面更新,需要提到的是資料需要進行歸一化,不然由於計算機內浮點數精度原因sigmod函式會取到1,log(sigmod)會出log(0)錯,後面用tensorflow實現的版本用adam優化在不進行歸一化的情況下可以收斂。

首先是梯度下降的程式碼

nolinear.py裡面封裝了一下sigmod函式

import math
def sigmods(x):
    return 1/(math.exp(-x)+1)

然後是主檔案:

import nolinear as nl
import numpy as np
import matplotlib.pyplot as plt
import math

data_x = np.loadtxt("ex4Data/ex4x.dat")
data_y = np.loadtxt("ex4Data/ex4y.dat")

plt.axis([15, 65, 40, 90])
plt.xlabel("exam 1 score")
plt.ylabel("exam 2 score")
for i in range(data_y.size):
    if data_y[i] == 1:
        plt.plot(data_x[i][0], data_x[i][1], 'b+')
    else:
        plt.plot(data_x[i][0], data_x[i][1], 'bo')

mean = data_x.mean(axis=0)
variance = data_x.std(axis=0)
data_x = (data_x-mean)/variance

data_y = data_y.reshape(-1, 1)          # 拼接
temp = np.ones(data_y.size)
data_x = np.c_[data_x, temp]

learn_rate = 0.1
theda = np.zeros([3])

loss = 0
old_loss = 0

for i in range(data_y.size):
    if data_y[i] == 1:
        loss += math.log10(nl.sigmods(np.matmul(data_x[i], theda)))
    else:
        loss += math.log10(1-nl.sigmods(np.matmul(data_x[i], theda)))

while abs(loss-old_loss) > 0.001:
    temp = np.matmul(data_x, theda)
    dew = np.zeros([3])
    for i in range(data_y.size):
        dew += (data_y[i]-nl.sigmods(temp[i]))*data_x[i]
    theda = theda+learn_rate*dew
    old_loss = loss
    loss = 0
    for i in range(data_y.size):
        if data_y[i] == 1:
            loss += math.log10(nl.sigmods(np.matmul(data_x[i], theda)))
        else:
            loss += math.log10(1 - nl.sigmods(np.matmul(data_x[i], theda)))
    print(-old_loss)
plot_y = np.zeros(65-16)
plot_x = np.arange(16, 65)
for i in range(16, 65):
    plot_y[i-16] = -(theda[2]+theda[0]*((i-mean[0])/variance[0]))/theda[1]
    plot_y[i - 16] = plot_y[i-16]*variance[1]+mean[1]
plt.plot(plot_x, plot_y)
plt.show()

最後得到結果:

loss函式的值可以看到幾步就收斂了,跟同學比有點過快,現在還沒搞明白原因

接著給出SGD的程式碼,SGD每次隨機抽取兩個樣本進行梯度下降

import nolinear as nl
import numpy as np
import matplotlib.pyplot as plt
import math
import random

data_x = np.loadtxt("ex4Data/ex4x.dat")
data_y = np.loadtxt("ex4Data/ex4y.dat")

plt.axis([15, 65, 40, 90])
plt.xlabel("exam 1 score")
plt.ylabel("exam 2 score")
for i in range(data_y.size):
    if data_y[i] == 1:
        plt.plot(data_x[i][0], data_x[i][1], 'b+')
    else:
        plt.plot(data_x[i][0], data_x[i][1], 'bo')

mean = data_x.mean(axis=0)
variance = data_x.std(axis=0)
data_x = (data_x-mean)/variance

data_y = data_y.reshape(-1, 1)          # 拼接
temp = np.ones(data_y.size)
data_x = np.c_[data_x, temp]

learn_rate = 0.1
theda = np.zeros([3])

loss = 0
old_loss = 0

for i in range(data_y.size):
    if data_y[i] == 1:
        loss += math.log10(nl.sigmods(np.matmul(data_x[i], theda)))
    else:
        loss += math.log10(1-nl.sigmods(np.matmul(data_x[i], theda)))

while abs(loss-old_loss) > 0.001:
    temp = np.matmul(data_x, theda)
    dew = np.zeros([3])
    j = random.randint(0, data_y.size-1)
    dew += (data_y[j]-nl.sigmods(temp[j]))*data_x[j]
    z = random.randint(0, data_y.size - 1)
    while j == z:
        z = random.randint(0, data_y.size - 1)
    dew += (data_y[z] - nl.sigmods(temp[z])) * data_x[z]
    theda = theda+learn_rate*dew
    old_loss = loss
    loss = 0
    for i in range(data_y.size):
        if data_y[i] == 1:
            loss += math.log10(nl.sigmods(np.matmul(data_x[i], theda)))
        else:
            loss += math.log10(1 - nl.sigmods(np.matmul(data_x[i], theda)))
    print(-old_loss)
plot_y = np.zeros(65-16)
plot_x = np.arange(16, 65)
for i in range(16, 65):
    plot_y[i-16] = -(theda[2]+theda[0]*((i-mean[0])/variance[0]))/theda[1]
    plot_y[i - 16] = plot_y[i-16]*variance[1]+mean[1]
plt.plot(plot_x, plot_y)
plt.show()

每次執行sgd其結果都不一定一樣,但是loss值都是收斂於14左右,執行結果圖示:

第二次執行

loss的變化:

tensorflow實現版

import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt

data_x = np.loadtxt("ex4Data/ex4x.dat")
data_y = np.loadtxt("ex4Data/ex4y.dat")


plt.axis([15, 65, 40, 90])
plt.xlabel("exam 1 score")
plt.ylabel("exam 2 score")
for i in range(data_y.size):
    if data_y[i] == 1:
        plt.plot(data_x[i][0], data_x[i][1], 'b+')
    else:
        plt.plot(data_x[i][0], data_x[i][1], 'bo')
data_y = data_y.reshape(-1, 1)
x = tf.placeholder("float", [None, 2])
y = tf.placeholder("float", [None, 1])

w = tf.Variable(tf.zeros([2, 1]))
bias = tf.Variable(tf.zeros([1, 1]))
z = tf.matmul(x, w)+bias
xita = w
b = bias
loss = tf.reduce_sum(y*tf.log(tf.sigmoid(z))+(1-y)*tf.log(1-tf.sigmoid(z)))
tf.summary.scalar("loss_function", -loss)

train_opt = tf.train.AdamOptimizer(0.1).minimize(-loss)

merge = tf.summary.merge_all()

init = tf.global_variables_initializer()
summary_writer = tf.summary.FileWriter("log", tf.get_default_graph())

sess = tf.Session()
sess.run(init)

for i in range(1000):
    train, loss_value, w_value, b_value, summary = sess.run([train_opt, loss, xita, b, merge], feed_dict={x: data_x, y: data_y})
    summary_writer.add_summary(summary, i)
    print(loss_value)
w_value = np.array(w_value)
w_value = w_value.reshape(-1)
b_value = np.array(b_value)
plot_y = np.zeros(65 - 16)
plot_x = np.arange(16, 65)
for j in range(16, 65):
    plot_y[j - 16] = -(b_value[0] + w_value[0] * j) / w_value[1]
plt.plot(plot_x, plot_y)
plt.show()


summary_writer.close()

結果圖:

牛頓法將在後面更新……