Logistic 迴歸—LogisticRegressionCV實現引數優化

阿新 • • 發佈：2018-12-16

1、準備

# 首先 import 必要的模組
import pandas as pd 
import numpy as np
 
from sklearn.model_selection import GridSearchCV
 
#競賽的評價指標為logloss
from sklearn.metrics import log_loss  
 
from matplotlib import pyplot
import seaborn as sns
%matplotlib inline
 
data = pd.read_csv('Otto_train.csv')
data.head()
data.info()
data.describe()
data.shape
#受機器效能所限取前兩萬條資料
data = data[:20000]
 
# Target 分佈，看看各類樣本分佈是否均衡
sns.countplot(data.target)
pyplot.xlabel('target');
pyplot.ylabel('Number of occurrences');

2、資料的標準化

# 將類別字串變成數字
y_train = data.target
y_train = y_train.map(lambda s:s[6:])
y_train = y_train.map(lambda s:int(s)-1)
 
data = data.drop(['target','id'],axis=1)
X_train = np.array(data)
 
# 資料標準化
from sklearn.preprocessing import StandardScaler
 
# 初始化特徵的標準化器
ss_X = StandardScaler()
 
# 分別對訓練和測試資料的特徵進行標準化處理
X_train = ss_X.fit_transform(X_train)
 
from sklearn.linear_model import LogisticRegression
from sklearn.cross_validation import cross_val_score
lr= LogisticRegression()
# 交叉驗證用於評估模型效能和進行引數調優（模型選擇）
#分類任務中交叉驗證預設是採用StratifiedKFold
loss = cross_val_score(lr, X_train, y_train, cv=5, scoring='neg_log_loss')
print('logloss of each fold is: ',-loss)
print('cv logloss is:', -loss.mean())

3、用LogisticRegressionCV的L1正則

from sklearn.linear_model import LogisticRegressionCV



Cs = [1, 10,100,1000]



# 大量樣本（6W+）、高維度（93），L1正則 --> 可選用saga優化求解器(0.19版本新功能)

# LogisticRegressionCV比GridSearchCV快

lrcv_L1 = LogisticRegressionCV(Cs=Cs, cv = 5, scoring='neg_log_loss', penalty='l1', solver='liblinear', multi_class='ovr')

lrcv_L1.fit(X_train, y_train)    

LogisticRegressionCV(Cs=[1, 10, 100, 1000], class_weight=None, cv=5,
           dual=False, fit_intercept=True, intercept_scaling=1.0,
           max_iter=100, multi_class='ovr', n_jobs=1, penalty='l1',
           random_state=None, refit=True, scoring='neg_log_loss',
           solver='liblinear', tol=0.0001, verbose=0)

lrcv_L1.scores_

# scores_：dict with classes as the keys, and the values as the grid of scores obtained during cross-validating each fold,
# Each dict value has shape (n_folds, len(Cs))
n_Cs = len(Cs)
n_classes = 3
scores =  np.zeros((n_classes,n_Cs))

for j in range(n_classes):
        scores[j][:] = np.mean(lrcv_L1.scores_[j],axis = 0)
    
mse_mean = -np.mean(scores, axis = 0)
pyplot.plot(np.log10(Cs), mse_mean.reshape(n_Cs,1)) 
#plt.plot(np.log10(reg.Cs)*np.ones(3), [0.28, 0.29, 0.30])
pyplot.xlabel('log(C)')
pyplot.ylabel('neg-logloss')
pyplot.show()

#print ('C is:',lr_cv.C_)  #對多類分類問題，每個類別的分類器有一個C

lrcv_L1.coef_

4、用LogisticRegressionCV的L2正則

from sklearn.linear_model import LogisticRegressionCV

Cs = [1, 10,100,1000]

# 大量樣本（6W+）、高維度（93），L2正則 --> 預設用lbfgs，為了和GridSeachCV比較，也用liblinear

lr_cv_L2 = LogisticRegressionCV(Cs=Cs, cv = 5, scoring='neg_log_loss', penalty='l2', solver='liblinear', multi_class='ovr')
lr_cv_L2.fit(X_train, y_train)   

lr_cv_L2.scores_

# dict with classes as the keys, and the values as the grid of scores obtained during cross-validating each fold,
# Each dict value has shape (n_folds, len(Cs))
n_Cs = len(Cs)
n_classes = 3
scores =  np.zeros((n_classes,n_Cs))

for j in range(n_classes):
        scores[j][:] = np.mean(lr_cv_L2.scores_[j],axis = 0)
    
mse_mean = -np.mean(scores, axis = 0)
pyplot.plot(np.log10(Cs), mse_mean.reshape(n_Cs,1)) 
#plt.plot(np.log10(reg.Cs)*np.ones(3), [0.28, 0.29, 0.30])
pyplot.xlabel('log(C)')
pyplot.ylabel('neg-logloss')
pyplot.show()

#print ('C is:',lr_cv.C_)  #對多類分類問題，每個類別的分類器有一個C


from sklearn.linear_model import LogisticRegressionCV

Cs = [1, 10,100,1000]

# 大量樣本（6W+）、高維度（93），L2正則 --> 預設用lbfgs
# LogisticRegressionCV比GridSearchCV快
lrcv_L2 = LogisticRegressionCV(Cs=Cs, cv = 5, scoring='neg_log_loss', penalty='l2', multi_class='ovr')
lrcv_L2.fit(X_train, y_train)    

lrcv_L2.scores_

# dict with classes as the keys, and the values as the grid of scores obtained during cross-validating each fold,
# Each dict value has shape (n_folds, len(Cs))
n_Cs = len(Cs)
n_classes = 3
scores =  np.zeros((n_classes,n_Cs))

for j in range(n_classes):
        scores[j][:] = np.mean(lrcv_L2.scores_[j],axis = 0)
    
mse_mean = -np.mean(scores, axis = 0)
pyplot.plot(np.log10(Cs), mse_mean.reshape(n_Cs,1)) 
#plt.plot(np.log10(reg.Cs)*np.ones(3), [0.28, 0.29, 0.30])
pyplot.xlabel('log(C)')
pyplot.ylabel('neg-logloss')
pyplot.show()

Logistic 迴歸—LogisticRegressionCV實現引數優化

1、準備 # 首先 import 必要的模組 import pandas as pd import numpy as np from sklearn.model_selection import GridSearchCV #競賽的評價指標為logloss from sklearn.me

Logistic迴歸之梯度上升優化演算法（二）

Logistic迴歸之梯度上升優化演算法（二）有了上一篇的知識儲備，這一篇部落格我們就開始Python3實戰 1、資料準備資料集：資料集下載資料集內容比較簡單，我們可以簡單理解為第一列X，第二列Y，第三列是分類標籤。根據標籤的不同，對這些資料點進行分類。

Logistic迴歸之梯度上升優化演算法（一）

Logistic迴歸之梯度上升優化演算法一、Logistic迴歸首先我們需要了解什麼是Logistic迴歸。Logistic迴歸是一種分類演算法，一般用於二分類問題，例如預測明天是否下雨，當然也可以用於多分類問題。本文主要是討論二分類問題。二分類問題即輸出結果一般只有兩個情況，我們可以理

Logistic迴歸之梯度上升優化演算法（四）

Logistic迴歸之梯度上升優化演算法（四）從疝氣病症狀預測病馬的死亡率 1、實戰背景我們使用Logistic迴歸來預測患疝氣病的馬的存活問題。原始資料集點選這裡下載。資料中一個包含了368個樣本和28個特徵。這種病不一定源自馬的腸胃問題，其他問題也可能引發疝氣病。該資料集中包含了

Logistic迴歸之梯度上升優化演算法（三）

Logistic迴歸之梯度上升優化演算法（三） 1、改進的隨機梯度上升演算法前面兩節講了Logistic迴歸以及裡面常用的梯度上升優化演算法來找到最佳迴歸係數。但是梯度上升優化演算法的計算量很大，每次更新迴歸係數時都需要遍歷整個資料集。下面給出之前所講的梯度上升演算法： def gra

《機器學習實戰》5.Logistic迴歸原始碼實現

結合原始碼分析第五章中實現的Demo 執行環境：Anaconda——Jupyter Notebook Python版本為：3.6.2（原書程式碼實現為2.x 所以在一些程式碼上略有改動）參考資料: 閱讀本文你將獲得如下知識： 1.LR的基本

【十】機器學習之路——logistic迴歸python實現

前面一個部落格機器學習之路——logistic迴歸講了logistic迴歸的理論知識，現在咱們來看一下logistic迴歸如何用python來實現，程式碼、資料參考《機器學習實戰》。首先看下我們要處理的資料，我們要做的就是通過logistic

Logistic迴歸 Python實現

Logistic迴歸 Logistic函式 f(x)=11+e−x 其函式影象為：繪圖方法 >>> import numpy as np >>>

Logistic迴歸（Python實現）

這篇文章是《機器學習實戰》（Machine Learning in Action）第五章 Logistic迴歸演算法的Python實現程式碼。 1 參考連結機器學習實戰 2 實現程式碼 from numpy import * def loadDataSet():

機器學習實戰——Logistic迴歸實現記錄

問題：NameError: name 'weights' is not defined 屬於作者的排版錯誤； weights = logRegres.gradAscent(dataArr,labelMat) 所以： weig

【機器學習演算法實現】logistic迴歸基於Python和Numpy函式庫

分享一下我老師大神的人工智慧教程！零基礎，通俗易懂！http://blog.csdn.net/jiangjunshow 也歡迎大家轉載本篇文章。分享知識，造福人民，實現我們中華民族偉大復興！

機器學習之logistic迴歸演算法與程式碼實現

Logistic迴歸演算法與程式

機器學習筆記（四）Logistic迴歸實現及正則化

一、Logistic迴歸實現（一）特徵值較少的情況 1. 實驗資料吳恩達《機器學習》第二課時作業提供資料1。判斷一個學生能否被一個大學錄取，給出的資料集為學生兩門課的成績和是否被錄取，通過這些資料來預測一個學生能否被錄取。 2. 分類結果評估橫縱軸（特徵）為學生兩門課成績，可以在圖

tensorflow實現logistic迴歸進行手寫字識別

1.資料準備 import tensorflow as tf from tensorflow.examples.tutorials.mnist import input_data

Logistic 迴歸—SVM正則引數調優操作筆記

1、準備 # 首先 import 必要的模組 import pandas as pd import numpy as np from sklearn.model_selection import GridSearchCV #競賽的評價指標為logloss #from sklearn.met

機器學習8/100天-Logistic迴歸原理與實現

Day 8 Logistic迴歸原理與實現 github: 100DaysOfMLCode 最大似然函式 L =

python實現信用卡欺詐檢測 logistic迴歸邏輯迴歸演算法

1.資料集下載：連結: https://pan.baidu.com/s/1zUxSxwiProvfmAAWjyYb4w 密碼: 6eai 程式碼下載：連結: https://pan.baidu.com/s/1KyVOEU3p-sfCQIauCXGWIA 密碼:

機器學習實戰——python實現Logistic迴歸

簡介 Logistic迴歸的目的是尋找一個非線性函式Sigmoid的最佳擬合引數，一般使用梯度上升演算法。對於有n個屬性的train資料集(X1,X2,...Xn),我們尋找一組迴歸係數(W0,W1

logistic迴歸演算法原理及python實現

1 logistic迴歸與sigmoid函式考慮如下線性函式： y=wwTxx+b(1) 輸出y為連續的實值，如何讓輸出成為二值來完成二分類任務？即y∈{0,1},最理想的是單位階躍函式即： y=⎧⎩⎨⎪⎪0,z<00.5,z=01,z>0

python實現logistic迴歸演算法

''' logistic迴歸函式 ''' from __future__ import print_function import tensorflow as tf #匯入MNIST資料 from tensorflow.examples.tutorial

Logistic 迴歸—LogisticRegressionCV實現引數優化

1、準備

2、資料的標準化

3、用LogisticRegressionCV的L1正則

4、用LogisticRegressionCV的L2正則

相關推薦