1. 程式人生 > >遞歸創建決策樹

遞歸創建決策樹

創建 ati arch uil conf 結束 ESS 什麽是 build

一、什麽是遞歸?

  • 在函數內部,可以調用其他函數,如果一個函數內部調用自己本身,這個函數就叫做遞歸函數。

    • PS : 在函數內部調用其他函數不是函數的嵌套,而在函數的內部定義子函數才是函數的嵌套。

  • 遞歸的特性:

    • 遞歸函數必須有一個明確的結束條件

    • 每進入更深一層的遞歸時,問題規模相對於上一次遞歸都應減少

    • 相鄰兩次重復之間有緊密的聯系,前一次要為後一次做準備(通常前一次的輸出作為後一次的輸入)

    • 遞歸的效率不高,遞歸層次過多會導致棧溢出(在計算機中,函數調用是通過棧(stack)這種數據結構實現的,每當進入一次方法調用,棧就會加一層棧幀,每當返回一層棧幀,棧就會減一層棧幀。由於棧的大小不是無限的,所以,遞歸調用的次數過多,會導致棧溢出)

  • 先看一個例子,一個關於實現疊加的兩種方法的例子:
    import sys
    #通過循環來實現疊加
    def sum1(n):
    ‘‘‘
    1 to n,The sum function
    ‘‘‘
    sum = 0
    for i in range(1,n + 1):
    sum += i
    return sum
    ?
    #通過函數的遞歸來實現疊加
    def sum2(n):
    ‘‘‘
    1 to n,The sum function
    ‘‘‘
    if n > 0:
    return n + sum_recu(n - 1)    #調用函數自身
    else:
    return 0
    ?
    print("循環疊加-->",sum1(100))
    print("遞歸疊加-->",sum2(100))
    ?
    #兩者實現的效果均是:5050
    • 從上述的例子可以看出,兩者都實現了疊加的效果,那麽後者相對於前者有什麽優點和缺點?

二 、遞歸函數有什麽優缺點?

  • 遞歸函數的優點

    • 定義簡單,邏輯(logic)清晰。理論上,所有的遞歸都可以寫成循環的方式,但循環的邏輯不如遞歸清晰

  • 遞歸的缺點

    • 遞歸調用的次數過多,會導致棧溢出(stackoverflow)

三、我們使用遞歸函數創建決策樹

  • Implement the function build_tree(rows). This is the function we use to actually build our tree. Please follow the steps below,

    • We will be using recursive function here (遞歸函數)

    • Find the best split using the method we implemented before, store information gain and the question to a local variable

    • Define the ending condition. If there is no gain, i.e. gain == 0, return a leaf node Leaf(rows)

    • Otherwise, get the partition of the tree at the current node with the best question(Determine object that we got before)

    • We use DFS(Depth First Search) to build the tree, and do the true_branch recursively first.

    • We then split the false_branch recursively

    • At last, we need to return something. We will return a DecisionNode object here since the starting point is also a DecisionNode

    • Notes:

      • This function might take you some time and thinking. Be patient

      • You need to understand the logic behind our DT before you even start to think. Talk to me if you are not feeling confident enough

      • Look up recursive function and depth first search if necessary.

  • code is as follows

    def build_tree(rows):
    """
    開始創建我們的決策樹,使用遞歸法
    Building our tree recursively
    :param rows: 一部分數據 a subset of our data set
    :return: recursively return a decision node and finally a tree
    """
    # Your code here**-**
    # 找到這組數據的最佳分割點 looking for the datasets best split
    # 此處build_tree_best_question本身就是一對象,可以直接使用
    build_tree_best_gain, build_tree_best_question = find_best_split(rows)
    # When info_gain = 0, return Leaf(rows)
    if build_tree_best_gain == 0:
    return Leaf(rows)
    # 按照最佳分割點進行分割
    true_node, false_node = partition(rows,build_tree_best_question)
    left_tree = build_tree(true_node)
    right_tree = build_tree(false_node)
    # otherwise return DecisionNode
    return DecisionNode(build_tree_best_question,left_tree,right_tree)
  • JAN 1.9

遞歸創建決策樹