客戶貸款逾期預測[3]-xgboost和lightgbm
阿新 • • 發佈:2018-12-04
任務
根據客戶貸款資料預測客戶是否會逾期,1表示會,0表示不會。
實現
# -*- coding: utf-8 -*- """ Created on Thu Nov 15 13:02:11 2018 @author: keepi """ import pandas as pd from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.metrics import f1_score import xgboost as xgb import warnings warnings.filterwarnings('ignore') pd.set_option('display.max_row',1000) #匯入資料 data = pd.read_csv('data.csv',encoding='gb18030') print("data.shape:",data.shape) #資料處理 miss_rate = data.isnull().sum() / len(data) print("缺失率:",miss_rate.sort_values(ascending=False)) X_num = data.select_dtypes('number').copy() X_num.fillna(X_num.mean(),inplace=True) print("數值型特徵的shape:",X_num.shape) print(X_num.columns) X_num.drop(['Unnamed: 0','status'],axis=1,inplace=True) X_str = data.select_dtypes(exclude='number').copy() X_str.fillna(0,inplace=True) print("非數值型特徵:",X_str.columns) print(X_str.head()) X_dummy = pd.get_dummies(X_str['reg_preference_for_trad']) X = pd.concat([X_num,X_dummy],axis=1,sort=False) y = data['status'] #劃分訓練集、測試集 X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.3,random_state=1117) #歸一化 ss = StandardScaler() X_train_std = ss.fit_transform(X_train) X_test_std = ss.transform(X_test) print("f1_score:") #xgboost xgb_train = xgb.DMatrix(X_train_std,label = y_train) xgb_test = xgb.DMatrix(X_test_std) xgb_params = { 'learning_rate':0.1, 'n_estimators':1000, 'max_depth':6, 'min_child_weight':1, 'gamma':0, 'subsample':0.8, 'colsample_bytree':0.8, 'objective':'binary:logistic', 'nthread':4, 'scale_pos_weight':1, 'seed':1118 } xgb_model = xgb.train(xgb_params, xgb_train, num_boost_round=xgb_params['n_estimators']) test_xgb_pred_prob = xgb_model.predict(xgb_test) test_xgb_pred = (test_xgb_pred_prob >= 0.5) + 0 print("xgboost:",f1_score(y_test,test_xgb_pred)) #xgboost sklearn版 from xgboost.sklearn import XGBClassifier xgbc = XGBClassifier(**xgb_params) xgbc.fit(X_train_std,y_train) test_xgbc_pred = xgbc.predict(X_test_std) print('xgbc:',f1_score(y_test,test_xgbc_pred)) #lightgbm import lightgbm as lgb lgb_params = { 'learning_rate':0.1, 'n_estimators':50, 'max_depth':4, 'min_child_weight':1, 'gamma':0, 'subsample':0.8, 'colsample_bytree':0.8, 'objective':'binary', 'nthread':4, 'scale_pos_weight':1, 'seed':1117 } dtrain = lgb.Dataset(X_train_std,y_train) lgb_model = lgb.train(lgb_params, dtrain, num_boost_round=lgb_params['n_estimators']) test_lgb_pred_prob = lgb_model.predict(X_test_std) test_lgb_pred = (test_lgb_pred_prob >=0.5) + 0 print('lgb:',f1_score(y_test,test_lgb_pred)) #lightgbm sklearn版 from lightgbm.sklearn import LGBMClassifier lgb_model2 = LGBMClassifier(**lgb_params) lgb_model2.fit(X_train_std,y_train) test_lgbsk_pred = lgb_model2.predict(X_test_std) print('lgb sklearn:',f1_score(y_test,test_lgbsk_pred))
結果:f1_score:
遇到的問題
1.DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
這是一個關於numpy的警告,在空陣列上棄用了真值檢查,在新版的numpy中已經修復,這裡我選擇忽略警告。
import warnings
warnings.filterwarnings('ignore')
現在只是學瞭如何呼叫xgboost和lightgbm,不知道如何調整引數,之後需要加強原理理解。
還有如何選擇評分指標也是一個問題。機器學習