1. 程式人生 > >決策樹的使用及資料視覺化

決策樹的使用及資料視覺化

from sklearn import tree
dtr=tree.DecisionTreeRegressor(max_depth=2)#例項化一個決策樹類
dtr.fit(housing.data[:,[6,7]],housing.target)#呼叫fit函式進行訓練
dot_data=\#將決策樹資料轉換成DOT格式
    tree.export_graphviz(
        dtr,
        out_file=None,
        feature_names=housing.feature_names[6:8],
        filled=True,
        impurity=False,
        rounded=True
    )
import pydotplus#該包為專門繪製DOT資料的視覺化包
graph=pydotplus.graph_from_dot_data(dot_data)#以DOT資料進行graph繪製
graph.get_nodes()[7].set_fillcolor("#FFF2DD")#設定顯示顏色
from IPython.display import Image
Image(graph.create_png())#將graph影象顯示出來

sklearn.tree.export_graphviz(decision_tree, out_file=None, max_depth=None,
feature_names=None,
class_names=None,
label=’all’,
filled=False,
leaves_parallel=False,
impurity=True,
node_ids=False,
proportion=False,
rotate=False,
rounded=False,
special_characters=False,
precision=3)

作用:Export a decision tree in DOT format.
引數:
decision_tree

: decision tree regressor or classifier
The decision tree to be exported to GraphViz.
out_file : file object or string, optional (default=None)
Handle or name of the output file. If None, the result is returned as a string.
Changed in version 0.20: Default of out_file changed from “tree.dot” to None.
max_depth
: int, optional (default=None)
The maximum depth of the representation. If None, the tree is fully generated.
feature_names : list of strings, optional (default=None)
Names of each of the features.
class_names : list of strings, bool or None, optional (default=None)
Names of each of the target classes in ascending numerical order. Only relevant for classification and not supported for multi-output. If True, shows a symbolic representation of the class name.
label : {‘all’, ‘root’, ‘none’}, optional (default=’all’)
Whether to show informative labels for impurity, etc. Options include ‘all’ to show at every node, ‘root’ to show only at the top root node, or ‘none’ to not show at any node.
filled : bool, optional (default=False)
When set to True, paint nodes to indicate majority class for classification, extremity of values for regression, or purity of node for multi-output.
leaves_parallel : bool, optional (default=False)
When set to True, draw all leaf nodes at the bottom of the tree.
impurity : bool, optional (default=True)
When set to True, show the impurity at each node.
node_ids : bool, optional (default=False)
When set to True, show the ID number on each node.
proportion : bool, optional (default=False)
When set to True, change the display of ‘values’ and/or ‘samples’ to be proportions and percentages respectively.
rotate : bool, optional (default=False)
When set to True, orient tree left to right rather than top-down.
rounded : bool, optional (default=False)
When set to True, draw node boxes with rounded corners and use Helvetica fonts instead of Times-Roman.
special_characters : bool, optional (default=False)
When set to False, ignore special characters for PostScript compatibility.
precision : int, optional (default=3)
Number of digits of precision for floating point in the values of impurity, threshold and value attributes of each node.

Returns:
dot_data : string
String representation of the input tree in GraphViz dot format. Only returned if out_file is None.

graph.get_nodes()

print(help(graph.get_nodes))
out:Help on method get_nodes in module pydotplus.graphviz:
get_nodes() method of pydotplus.graphviz.Dot instance
    Get the list of Node instances.
None

要顯示影象需要兩個額外的外掛,一個是graphviz資料視覺化框架,一個
pydotplus影象包,這兩個都需要額外安裝,anaconda不預設提供。
我裝這兩個包花了大概兩個小時:
第一次嘗試:
http://www.graphviz.org/Download..php 網站下載msi安裝包直接安裝。
pip install pydotplus安裝pydotplus結果出現
GraphViz’s executables not found錯誤。
第二次嘗試:
兩個包都刪除然後都是用anaconda安裝,依然失敗
第三次嘗試:
將graphviz的bin路徑檔案加到path環境變數,仍然失敗。
第四次嘗試:
轉換Graphviz和pydotplus的安裝次序,仍然失敗。
第五次嘗試:
程式碼新增graphviz到環境變數:

import os     
os.environ["PATH"] += os.pathsep + 'C:/Program Files (x86)/Graphviz2.38/bin/'

成功!