遞歸創建決策樹
一、什麽是遞歸?
-
在函數內部,可以調用其他函數,如果一個函數內部調用自己本身,這個函數就叫做遞歸函數。
-
PS : 在函數內部調用其他函數不是函數的嵌套,而在函數的內部定義子函數才是函數的嵌套。
-
-
遞歸的特性:
-
遞歸函數必須有一個明確的結束條件
-
每進入更深一層的遞歸時,問題規模相對於上一次遞歸都應減少
-
相鄰兩次重復之間有緊密的聯系,前一次要為後一次做準備(通常前一次的輸出作為後一次的輸入)
-
遞歸的效率不高,遞歸層次過多會導致棧溢出(在計算機中,函數調用是通過棧(stack)這種數據結構實現的,每當進入一次方法調用,棧就會加一層棧幀,每當返回一層棧幀,棧就會減一層棧幀。由於棧的大小不是無限的,所以,遞歸調用的次數過多,會導致棧溢出)
-
-
先看一個例子,一個關於實現疊加的兩種方法的例子:
import sys
#通過循環來實現疊加
def sum1(n):
‘‘‘
1 to n,The sum function
‘‘‘
sum = 0
for i in range(1,n + 1):
sum += i
return sum
?
#通過函數的遞歸來實現疊加
def sum2(n):
‘‘‘
1 to n,The sum function
‘‘‘
if n > 0:
return n + sum_recu(n - 1) #調用函數自身
else:
return 0
?
print("循環疊加-->",sum1(100))
print("遞歸疊加-->",sum2(100))
?
#兩者實現的效果均是:5050-
從上述的例子可以看出,兩者都實現了疊加的效果,那麽後者相對於前者有什麽優點和缺點?
-
二 、遞歸函數有什麽優缺點?
-
遞歸函數的優點
-
定義簡單,邏輯(logic)清晰。理論上,所有的遞歸都可以寫成循環的方式,但循環的邏輯不如遞歸清晰
-
-
遞歸的缺點
-
遞歸調用的次數過多,會導致棧溢出(stackoverflow)
-
三、我們使用遞歸函數創建決策樹
-
Implement the function
build_tree(rows)
. This is the function we use to actually build our tree. Please follow the steps below,-
We will be using recursive function here (遞歸函數)
-
Find the best split using the method we implemented before, store information gain and the question to a local variable
-
Define the ending condition. If there is no gain, i.e.
gain == 0
, return a leaf nodeLeaf(rows)
-
Otherwise, get the partition of the tree at the current node with the best question(
Determine
object that we got before) -
We use DFS(Depth First Search) to build the tree, and do the true_branch recursively first.
-
We then split the false_branch recursively
-
At last, we need to return something. We will return a
DecisionNode
object here since the starting point is also aDecisionNode
-
Notes:
-
This function might take you some time and thinking. Be patient
-
You need to understand the logic behind our DT before you even start to think. Talk to me if you are not feeling confident enough
-
Look up recursive function and depth first search if necessary.
-
-
-
code is as follows
def build_tree(rows):
"""
開始創建我們的決策樹,使用遞歸法
Building our tree recursively
:param rows: 一部分數據 a subset of our data set
:return: recursively return a decision node and finally a tree
"""
# Your code here**-**
# 找到這組數據的最佳分割點 looking for the datasets best split
# 此處build_tree_best_question本身就是一對象,可以直接使用
build_tree_best_gain, build_tree_best_question = find_best_split(rows)
# When info_gain = 0, return Leaf(rows)
if build_tree_best_gain == 0:
return Leaf(rows)
# 按照最佳分割點進行分割
true_node, false_node = partition(rows,build_tree_best_question)
left_tree = build_tree(true_node)
right_tree = build_tree(false_node)
# otherwise return DecisionNode
return DecisionNode(build_tree_best_question,left_tree,right_tree) -
JAN 1.9
-
遞歸創建決策樹