1. 程式人生 > >Decision Tree Regression with AdaBoost 自適應增強決策樹散點回歸

Decision Tree Regression with AdaBoost 自適應增強決策樹散點回歸

示例網址:
https://scikit-learn.org/stable/auto_examples/ensemble/plot_adaboost_regression.html#sphx-glr-auto-examples-ensemble-plot-adaboost-regression-py

當y = list(np.sin(3*X).ravel() + rng.normal(0, 0.1, X.shape[0]))時:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import AdaBoostRegressor

#Create the dataset
rng = np.random.RandomState(1) X = np.linspace(0, 6, 100)[:, np.newaxis] y = list(np.sin(X).ravel() +np.sin(6*X).ravel() + rng.normal(0, 0.1, X.shape[0])) #Fit regression model DTR_1 = DecisionTreeRegressor(max_depth=4) DTR_2 = AdaBoostRegressor(DecisionTreeRegressor(max_depth=4),n_estimators=300, random_state=
rng) DTR_1.fit(X, y) DTR_2.fit(X, y) #Predict y_1 = DTR_1.predict(X) y_2 = DTR_2.predict(X) #Plot the results plt.figure() plt.scatter(X, y, c="k", label="training samples") plt.plot(X, y_1, c="g", label="n_estimators=1", linewidth=2) plt.plot(X, y_2, c="r", label="n_estimators=300", linewidth=2) plt.
xlabel("data") plt.ylabel("target") plt.title("Boosted Decision Tree Regression") plt.legend() plt.show()

參考網址:

np.newaxis插入新維度:
https://blog.csdn.net/mameng1/article/details/54599306

[np.newaxis,:]和[:,np.newaxis]和分別是在行或列上增加維度,原來是(6,)的陣列,在行上增加維度變成(1,6)的二維陣列,在列上增加維度變為(6,1)的二維陣列

rand vs normal in Numpy.random:
https://www.geeksforgeeks.org/rand-vs-normal-numpy-random-python/

numpy.random.rand(d0, d1, …, dn) :
creates an array of specified shape and fills it with random values.

numpy.random.normal(loc = 0.0, scale = 1.0, size = None) :
creates an array of specified shape and fills it with random values which is actually a part of Normal(Gaussian) Distribution.

CSV資料的讀寫:
1.csv.reader讀寫csv資料:
https://python3-cookbook.readthedocs.io/zh_CN/latest/c06/p01_read_write_csv_data.html

import csv
with open('stocks.csv') as f:
    f_csv = csv.reader(f)
    headers = next(f_csv)
    for row in f_csv:
        # Process row
        ...

2.pd.read_csv讀寫csv資料:
https://blog.csdn.net/atnanyang/article/details/70832257

import pandas as pd
df=pd.read_csv('filename',header=None,sep=' ')

注意事項:

n_estimators: 也就是弱學習器的最大迭代次數,或者說最大的弱學習器的個數。一般來說n_estimators太小,容易欠擬合,n_estimators太大,計算量會太大,並且n_estimators到一定的數量後,再增大n_estimators獲得的模型提升會很小,所以一般選擇一個適中的數值。預設是100。