一週演算法實踐__1.模型構建
阿新 • • 發佈:2019-01-05
預測貸款使用者是否逾期
資料集下載:https://pan.baidu.com/s/1dtHJiV6zMbf_fWPi-dZ95g
1.匯入模組
import numpy as np import pandas as pd from sklearn.model_selection import train_test_split,cross_val_score from sklearn.linear_model import LogisticRegression from sklearn.svm import SVC from sklearn.tree import DecisionTreeClassifier from sklearn.metrics import f1_score,accuracy_score,recall_score
2.劃分X和y並簡單分析資料
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.3,random_state=2018)
print(len(X_train))
print(len(X_test))
print(len(y_test[y_test==0])/len(y_test))
3327
1427
0.7484232655921513
訓練集數量:測試集數量=3:1
3.構建模型進行預測
# LogisticRegression模型 clf_Lr=LogisticRegression(random_state=0,solver='lbfgs').fit(X_train,y_train) y_test_pred=clf_Lr.predict(X_test) Lr_acc=accuracy_score(y_test,y_test_pred) f1=f1_score(y_test,y_test_pred,average='micro') print(f1) print(Lr_acc) print(np.unique(y_test_pred)) 0.7484232655921513 0.7484232655921513 [0]
#SVM模型
clf_SVM=SVC(gamma='auto').fit(X_train,y_train)
y_test_pred=clf_SVM.predict(X_test)
SVM_acc=accuracy_score(y_test,y_test_pred)
f1=f1_score(y_test,y_test_pred,average='micro')
print(f1)
print(SVM_acc)
print(np.unique(y_test_pred))
0.7484232655921513
0.7484232655921513
[0]
#決策樹模型 clf_Tree=DecisionTreeClassifier(random_state=0).fit(X_train,y_train) y_test_pred=clf_Tree.predict(X_test) Tree_acc=accuracy_score(y_test,y_test_pred) f1=f1_score(y_test,y_test_pred,average='micro') print(f1) print(Tree_acc) print(np.unique(y_test_pred)) 0.6629292221443588 0.6629292221443588 [0 1]
通過對比可以看出,LR模型和SVM模型的準確率相同,決策樹模型的準確率略低。但是LR模型和SVM模型都將測試集中樣本預測為 0 ,分析可得len(y_test[y_test==0])/len(y_test)=0.7484232655921513。因此選用決策樹模型。