1. 程式人生 > >一週演算法實踐__1.模型構建

一週演算法實踐__1.模型構建

預測貸款使用者是否逾期

資料集下載:https://pan.baidu.com/s/1dtHJiV6zMbf_fWPi-dZ95g

1.匯入模組


import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split,cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import f1_score,accuracy_score,recall_score

2.劃分X和y並簡單分析資料


X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.3,random_state=2018)
print(len(X_train))
print(len(X_test))
print(len(y_test[y_test==0])/len(y_test))

3327
1427
0.7484232655921513

訓練集數量:測試集數量=3:1


3.構建模型進行預測


# LogisticRegression模型
clf_Lr=LogisticRegression(random_state=0,solver='lbfgs').fit(X_train,y_train)
y_test_pred=clf_Lr.predict(X_test)
Lr_acc=accuracy_score(y_test,y_test_pred)
f1=f1_score(y_test,y_test_pred,average='micro')
print(f1)
print(Lr_acc)
print(np.unique(y_test_pred))

0.7484232655921513
0.7484232655921513
[0]
#SVM模型
clf_SVM=SVC(gamma='auto').fit(X_train,y_train)
y_test_pred=clf_SVM.predict(X_test)
SVM_acc=accuracy_score(y_test,y_test_pred)
f1=f1_score(y_test,y_test_pred,average='micro')
print(f1)
print(SVM_acc)
print(np.unique(y_test_pred))

0.7484232655921513
0.7484232655921513
[0]
#決策樹模型
clf_Tree=DecisionTreeClassifier(random_state=0).fit(X_train,y_train)
y_test_pred=clf_Tree.predict(X_test)
Tree_acc=accuracy_score(y_test,y_test_pred)
f1=f1_score(y_test,y_test_pred,average='micro')
print(f1)
print(Tree_acc)
print(np.unique(y_test_pred))

0.6629292221443588
0.6629292221443588
[0 1]

通過對比可以看出,LR模型和SVM模型的準確率相同,決策樹模型的準確率略低。但是LR模型和SVM模型都將測試集中樣本預測為 0 ,分析可得len(y_test[y_test==0])/len(y_test)=0.7484232655921513。因此選用決策樹模型