機器學習之決策樹例項篇
阿新 • • 發佈:2019-02-06
1. python
2. Python機器學習的庫:scikit-learn
2.1: 特性:
4. 例子:
具體程式碼如下:
配置環境變數 轉化dot檔案至pdf視覺化決策樹:dot -Tpdf iris.dot -o outpu.pdf
簡單高效的資料探勘和機器學習分析 對所有使用者開放,根據不同需求高度可重用性 基於Numpy, SciPy和matplotlib 開源,商用級別:獲得 BSD許可2.2 覆蓋問題領域: 分類(classification), 迴歸(regression), 聚類(clustering), 降維(dimensionality reduction) 模型選擇(model selection), 預處理(preprocessing) 3. 使用用scikit-learn 安裝scikit-learn: pip, easy_install, windows installer 安裝必要package:numpy, SciPy和matplotlib, 可使用Anaconda (包含numpy, scipy等科學計算常用 package) 安裝注意問題:Python直譯器版本(2.7 or 3.4?), 32-bit or 64-bit系統
from sklearn.feature_extraction import DictVectorizer import csv from sklearn import tree from sklearn import preprocessing # 從csv中讀取資料 allElectronicsData = open(r'D:\BaiduNetdiskDownload\程式碼與素材\程式碼與素材(1)\01DTree\AllElectronics.csv', 'rt') reader = csv.reader(allElectronicsData) headers = next(reader) # print(headers) # 把特徵資料以字典的形式讀取到featureList,把標籤資料儲存到labelList featureList = [] labelList = [] for row in reader: labelList.append(row[len(row)-1]) rowDict = {} for i in range(1, len(row)-1): rowDict[headers[i]] = row[i] featureList.append(rowDict) # print(featureList) # print(labelList) # Vetorize features vec = DictVectorizer() dummyX = vec.fit_transform(featureList) .toarray() print("dummyX: " + str(dummyX)) print(vec.get_feature_names()) print("labelList: " + str(labelList)) # vectorize class labels lb = preprocessing.LabelBinarizer() dummyY = lb.fit_transform(labelList) print("dummyY: " + str(dummyY)) # Using decision tree for classification clf = tree.DecisionTreeClassifier(criterion='entropy') clf = clf.fit(dummyX, dummyY) print("clf: " + str(clf)) # Visualize model with open("allElectronicInformationGainOri.dot", 'w') as f: f = tree.export_graphviz(clf, feature_names=vec.get_feature_names(), out_file=f) # 構造一行資料 oneRowX = dummyX[0, :] print("oneRowX: " + str(oneRowX)) newRowX = oneRowX newRowX[0] = 1 newRowX[2] = 0 print("newRowX: " + str(newRowX)) # 預測 predictedY = clf.predict(newRowX) print("predictedY: " + str(predictedY))
配置環境變數 轉化dot檔案至pdf視覺化決策樹:dot -Tpdf iris.dot -o outpu.pdf