Python作業——sklearn

阿新 • • 發佈：2018-12-31

Scikit-Learn: Machine Learning in Python

學習目標：

學習python庫中的sklearn，掌握三種分類方法：樸素貝葉斯、SVM和隨機森林。通過完成assignment，對結果進行對比分析，簡要概括訓練成果。

Assignment :

In the second ML assignment you have to compare the performance of three different classification algorithms, namely Naive Bayes, SVM, and Random Forest. For this assignment you need to generate a random binary classification problem, and then train and test (using 10-fold cross validation) the three algorithms. For some algorithms inner cross validation (5-fold) for choosing the parameters is needed. Then, show the classification performace (per-fold and averaged) in the report, and briefly discussing the results.

Note：

The report has to contain also a short description of the methodology used

to obtain the results.

Steps:

1 Create a classification dataset (n samples >=1000, n features >=10)
2 Split the dataset using 10-fold cross validation
3 Train the algorithms

GaussianNB
   SVC (possible C values [1e-02, 1e-01, 1e00, 1e01, 1e02], RBF kernel)
   RandomForestClassifier (possible n estimators values [10, 100, 1000])
4 Evaluate the cross-validated performance
        Accuracy
        F1-score

AUC ROC

5 Write a short report summarizing the methodology and the results

Step1:

Create a classification dataset (n samples >=1000, n features >= 10)

關於分類,使用 Iris資料集 ,這個scikit-learn已經自帶。

  #返回值：

  #X：形狀陣列[n_samples，n_features]生成的樣本
  #y：形狀陣列[n_samples] 每個樣本的類成員的整數標籤

from sklearn import datasets

from sklearn import cross_validation

iris=datasets.load_iris()
#Artificial data generators
dataset=datasets.make_classification(n_samples=1000,n_features=10,
                                     n_informative=2,n_redundant=2,n_repeated=0,n_classes=2)

print(X)

print(y)

Step2 :

Split the dataset using 10-fold cross validation

from sklearn import cross_validation
kf=cross_validation.KFold(len(X),n_folds=10,shuffle=True)
for train_index,test_index in kf:
    X_train,y_train=X[train_index],y[train_index]
    X_test,y_test=X[test_index],y[test_index]

X_train:

X_test:

Y_train:

Y_test:

Step3:

Train the algorithms

      GaussianNB
   SVC (possible C values [1e-02, 1e-01, 1e00, 1e01, 1e02], RBF kernel)
   RandomForestClassifier (possible n estimators values [10, 100, 1000])

GaussianNB:

from sklearn.naive_bayes import GaussianNB
model1 = GaussianNB()
model1.fit(X_train, y_train)
predict = clf.predict(X_test)
print(predict)

predict:

SVC:

from sklearn.svm import SVC
for num in  [1e-02, 1e-01, 1e00, 1e01, 1e02]:
    model2= SVC(num, kernel='rbf', gamma=0.1)
    model2.fit(X_train, y_train)
    predict2 = model2.predict(X_test)
    print(predict2)

predict2:

RandomForestClassifier:

from sklearn.ensemble import RandomForestClassifier
for n_estimators in [10, 100, 1000]:
    #SVC
    model3 = RandomForestClassifier(n_estimators=6)
    model3.fit(X_train, y_train)
    predict3 = model3.predict(X_test)
    print(predict3)

predict3:

Step4:

Evaluate the cross-validated performance
Accuracy
F1-score

AUC ROC

GaussianNB:

from sklearn import metrics

accuracy = metrics.accuracy_score(y_test, predict)
print(accuracy)
F1_score = metrics.f1_score(y_test, pred)
print(F1_score)
auc_roc = metrics.roc_auc_score(y_test, predict)
print(auc_roc)

SVC:

for num in  [1e-02, 1e-01, 1e00, 1e01, 1e02]:
    model2 = SVC(num, kernel='rbf', gamma=0.1)
    model2.fit(X_train, y_train)
    predict2 = model2.predict(X_test)

    accurary = metrics.accuracy_score(y_test, predict2)
    print(accurary)
    F1_score = metrics.f1_score(y_test, predict2)
    print(F1_score)
    auc_roc = metrics.roc_auc_score(y_test, predict2)
    print(auc_roc)

RandomForestClassifier:

for n_estimators in [10, 100, 1000]:
    model3 = RandomForestClassifier(n_estimators=6)
    model3.fit(X_train, y_train)
    predict3 = model3.predict(X_test)

    accuracy = metrics.accuracy_score(y_test, predict3)
    print(accuracy)
    F1_score = metrics.f1_score(y_test, predict3)
    print(F1_score)
    auc_roc = metrics.roc_auc_score(y_test, predict3)
    print(auc_roc)

Step5:

Write a short report summarizing the methodology and the result

總結1：三個模型的效能評估從次到優分別是GaussianNB< SVC <RandomForestClassifier

總結2：SVC中，當C取值為1e00時最優

總結3 ：RandomForestClassifier中，n_estimators越小越優

（本次作業的耗時主要在關於Anaconda（在spider中）無法匯入sklearn，直接在ipython上是沒問題的）

sklearn提供了很多的資料集和訓練方法，有待於進一步學習。

Python作業——sklearn

Scikit-Learn: Machine Learning in Python

學習目標：

Assignment :

Note：

Steps:

1 Create a classification dataset (n samples >=1000, n features >=10)
2 Split the dataset using 10-fold cross validation
3 Train the algorithms

Step1:

Create a classification dataset (n samples >=1000, n features >= 10)

Step2 :

Split the dataset using 10-fold cross validation

Step3:

Train the algorithms

Python作業——sklearn

python-作業：員工信息表

python作業

Python作業3

python 作業1

Python作業之While應用

python作業（數據類型續）

python作業(運算符續)

Python作業2，購物車程序

購物車 python作業後面有些問題請大神幫忙完善一下，謝謝哈

python作業習題集錦

python作業1

python作業5：多級菜單

python作業02

python作業03-文件操作&函數（未完成）

python作業-多級菜單

Python聚類分析作業代寫代做、人工智能Python作業代寫

python 圖像歸一化作業代碼代編程代寫圖python作業

python-作業-2

python-作業-3

Python作業——sklearn

Scikit-Learn: Machine Learning in Python

學習目標：

Assignment :

Note：

Steps:

1 Create a classification dataset (n samples >=1000, n features >=10)2 Split the dataset using 10-fold cross validation3 Train the algorithms

Step1:

Create a classification dataset (n samples >=1000, n features >= 10)

Step2 :

Split the dataset using 10-fold cross validation

Step3:

Train the algorithms

相關推薦

1 Create a classification dataset (n samples >=1000, n features >=10)
2 Split the dataset using 10-fold cross validation
3 Train the algorithms