sklearn.model_selection中train_test_split()函式
阿新 • • 發佈:2018-12-19
train_test_split()是sklearn.model_selection中的分離器函式,用於將陣列或矩陣劃分為訓練集和測試集,函式樣式為: X_train, X_test, y_train, y_test = train_test_split(train_data, train_target, test_size, random_state,shuffle)
引數解釋:
- train_data:待劃分的樣本資料
- train_target:待劃分的對應樣本資料的樣本標籤
- test_size:1)浮點數,在0 ~ 1之間,表示樣本佔比(test_size = 0.3,則樣本資料中有30%的資料作為測試資料,記入X_test,其餘70%資料記入X_train,同時適用於樣本標籤);2)整數,表示樣本資料中有多少資料記入X_test中,其餘資料記入X_train
- random_state:隨機數種子,種子不同,每次採的樣本不一樣;種子相同,採的樣本不變(random_state不取,取樣資料不同,但random_state等於某個值,取樣資料相同,取0的時候也相同,這可以自己程式設計嘗試下,不過想改變數值也可以設定random_state = int(time.time()))
- shuffle:洗牌模式,1)shuffle = False,不打亂樣本資料順序;2)shuffle = True,打亂樣本資料順序
Python程式碼:
>>> import numpy as np >>> from sklearn.model_selection import train_test_split >>> X, y = np.arange(30).reshape((10, 3)), range(10) >>> X_train, X_test ,y_train, y_test= train_test_split(X, y,test_size=0.3, rando m_state = 20, shuffle=True) >>> X_train array([[15, 16, 17], [ 0, 1, 2], [ 6, 7, 8], [18, 19, 20], [27, 28, 29], [12, 13, 14], [ 9, 10, 11]]) >>> X_test array([[21, 22, 23], [ 3, 4, 5], [24, 25, 26]]) >>> y_train [5, 0, 2, 6, 9, 4, 3] >>> y_test [7, 1, 8]