時間序列--模型前的轉換
阿新 • • 發佈:2019-01-01
1.sqrt轉換
先看序列的trend,如果有二次曲線的表現形式的話,可以做個sqrt
昨晚sqrt之後張成這樣子
原來的資料長這樣:
做個sqrt
from pandas import Series from pandas import DataFrame from numpy import sqrt from matplotlib import pyplot series = Series.from_csv('airline-passengers.csv', header=0) dataframe = DataFrame(series.values) dataframe.columns = ['passengers'] dataframe['passengers'] = sqrt(dataframe['passengers']) pyplot.figure(1) # line plot pyplot.subplot(211) pyplot.plot(dataframe['passengers']) # histogram pyplot.subplot(212) pyplot.hist(dataframe['passengers']) pyplot.show()
變成這個樣子:
還是有趨勢啊。。。。
2.log轉換
昨晚log之後也應該張這樣子
利用上面的真實資料做log
from pandas import Series from pandas import DataFrame from numpy import log from matplotlib import pyplot series = Series.from_csv('airline-passengers.csv', header=0) dataframe = DataFrame(series.values) dataframe.columns = ['passengers'] dataframe['passengers'] = log(dataframe['passengers']) pyplot.figure(1) # line plot pyplot.subplot(211) pyplot.plot(dataframe['passengers']) # histogram pyplot.subplot(212) pyplot.hist(dataframe['passengers']) pyplot.show()
表現的更加正態了,log轉換很受歡迎
3.box-cox轉換
結果圖如上
https://machinelearningmastery.com/power-transform-time-series-forecast-data-python/
- lambda = -1. is a reciprocal transform.
- lambda = -0.5 is a reciprocal square root transform.
- lambda = 0.0 is a log transform.
- lambda
- lambda = 1.0 is no transform.
-
from pandas import Series from pandas import DataFrame from scipy.stats import boxcox from matplotlib import pyplot series = Series.from_csv('airline-passengers.csv', header=0) dataframe = DataFrame(series.values) dataframe.columns = ['passengers'] dataframe['passengers'] = boxcox(dataframe['passengers'], lmbda=0.0) pyplot.figure(1) # line plot pyplot.subplot(211) pyplot.plot(dataframe['passengers']) # histogram pyplot.subplot(212) pyplot.hist(dataframe['passengers']) pyplot.show()
這裡舉了個log的例子
-
神奇的是,他可以自己選一個lambda
-
We can set the lambda parameter to None (the default) and let the function find a statistically tuned value.
The following example demonstrates this usage, returning both the transformed dataset and the chosen lambda value.
-
from pandas import Series from pandas import DataFrame from scipy.stats import boxcox from matplotlib import pyplot series = Series.from_csv('airline-passengers.csv', header=0) dataframe = DataFrame(series.values) dataframe.columns = ['passengers'] dataframe['passengers'], lam = boxcox(dataframe['passengers']) print('Lambda: %f' % lam) pyplot.figure(1) # line plot pyplot.subplot(211) pyplot.plot(dataframe['passengers']) # histogram pyplot.subplot(212) pyplot.hist(dataframe['passengers']) pyplot.show()
mbda: 0.148023
1
Lambda: 0.148023