ARIMA 時間序列預測

阿新 • • 發佈：2018-11-25

ARIMA 時間序列預測

（學習資料及程式碼均從網上獲取。）

資料記錄AirPassengers.csv：

Month,#Passengers
1949-01,112
1949-02,118
1949-03,132
1949-04,129
1949-05,121
1949-06,135
1949-07,148
1949-08,148
1949-09,136
1949-10,119
1949-11,104
1949-12,118
1950-01,115
1950-02,126
1950-03,141
1950-04,135
1950-05,125
1950-06,149
1950-07,170
1950-08,170
1950-09,158
1950-10,133
1950-11,114
1950-12,140
1951-01,145
1951-02,150
1951-03,178
1951-04,163
1951-05,172
1951-06,178
1951-07,199
1951-08,199
1951-09,184
1951-10,162
1951-11,146
1951-12,166
1952-01,171
1952-02,180
1952-03,193
1952-04,181
1952-05,183
1952-06,218
1952-07,230
1952-08,242
1952-09,209
1952-10,191
1952-11,172
1952-12,194
1953-01,196
1953-02,196
1953-03,236
1953-04,235
1953-05,229
1953-06,243
1953-07,264
1953-08,272
1953-09,237
1953-10,211
1953-11,180
1953-12,201
1954-01,204
1954-02,188
1954-03,235
1954-04,227
1954-05,234
1954-06,264
1954-07,302
1954-08,293
1954-09,259
1954-10,229
1954-11,203
1954-12,229
1955-01,242
1955-02,233
1955-03,267
1955-04,269
1955-05,270
1955-06,315
1955-07,364
1955-08,347
1955-09,312
1955-10,274
1955-11,237
1955-12,278
1956-01,284
1956-02,277
1956-03,317
1956-04,313
1956-05,318
1956-06,374
1956-07,413
1956-08,405
1956-09,355
1956-10,306
1956-11,271
1956-12,306
1957-01,315
1957-02,301
1957-03,356
1957-04,348
1957-05,355
1957-06,422
1957-07,465
1957-08,467
1957-09,404
1957-10,347
1957-11,305
1957-12,336
1958-01,340
1958-02,318
1958-03,362
1958-04,348
1958-05,363
1958-06,435
1958-07,491
1958-08,505
1958-09,404
1958-10,359
1958-11,310
1958-12,337
1959-01,360
1959-02,342
1959-03,406
1959-04,396
1959-05,420
1959-06,472
1959-07,548
1959-08,559
1959-09,463
1959-10,407
1959-11,362
1959-12,405
1960-01,417
1960-02,391
1960-03,419
1960-04,461
1960-05,472
1960-06,535
1960-07,622
1960-08,606
1960-09,508
1960-10,461
1960-11,390
1960-12,432

ARIMA演算法

# -*- coding: utf-8 -*-

import pandas as pd
import numpy as np
import matplotlib.pylab as plt
from matplotlib.pylab import rcParams
#rcParams設定好畫布的大小
rcParams['figure.figsize'] = 15, 6

data = pd.read_csv("./AirPassengers.csv")
print (data.head())
print ('\n Data types:')
print (data.dtypes)

dateparse = lambda dates: pd.datetime.strptime(dates, '%Y-%m')
#---其中parse_dates 表明選擇資料中的哪個column作為date-time資訊，
#---index_col 告訴pandas以哪個column作為 index
#--- date_parser 使用一個function(本文用lambda表示式代替)，使一個string轉換為一個datetime變數
data = pd.read_csv('AirPassengers.csv', parse_dates=['Month'], index_col='Month',date_parser=dateparse)
print (data.head())
print (data.index)

#檢查時序資料的穩定性(Stationarity)
from statsmodels.tsa.stattools import adfuller
def test_stationarity(timeseries):
    
    #這裡以一年為一個視窗，每一個時間t的值由它前面12個月（包括自己）的均值代替，標準差同理。
    rolmean = timeseries.rolling(window=12).mean() 
    rolstd = timeseries.rolling(window=12).std()  
    #plot rolling statistics:
    fig = plt.figure()
    fig.add_subplot()
    orig = plt.plot(timeseries, color = 'blue',label='Original')
    mean = plt.plot(rolmean , color = 'red',label = 'rolling mean')
    std = plt.plot(rolstd, color = 'black', label= 'Rolling standard deviation')
    
    plt.legend(loc = 'best')
    plt.title('Rolling Mean & Standard Deviation')
    plt.show(block=False)
    
    
    #Dickey-Fuller test:
    
    print ('Results of Dickey-Fuller Test:')
    dftest = adfuller(timeseries,autolag = 'AIC')
    #dftest的輸出前一項依次為檢測值，p值，滯後數，使用的觀測數，各個置信度下的臨界值
    dfoutput = pd.Series(dftest[0:4],index = ['Test Statistic','p-value','#Lags Used','Number of Observations Used'])
    for key,value in dftest[4].items():
        dfoutput['Critical value (%s)' %key] = value
    
    print (dfoutput)
    
ts = data['#Passengers']
test_stationarity(ts)

ts_log = np.log(ts)
#moving_avg = pd.rolling_mean(ts_log,12)
moving_avg = ts_log.rolling(window=12).mean() 
plt.plot(ts_log ,color = 'blue')
plt.plot(moving_avg, color='red')

ts_log_moving_avg_diff = ts_log-moving_avg
ts_log_moving_avg_diff.dropna(inplace = True)
test_stationarity(ts_log_moving_avg_diff)


# halflife的值決定了衰減因子alpha：  alpha = 1 - exp(log(0.5) / halflife)
#expweighted_avg = pd.ewma(ts_log,halflife=12)
expweighted_avg = pd.DataFrame.ewm(ts_log,halflife=12).mean()
ts_log_ewma_diff = ts_log - expweighted_avg
test_stationarity(ts_log_ewma_diff)


ts_log_diff = ts_log - ts_log.shift()
ts_log_diff.dropna(inplace=True)
test_stationarity(ts_log_diff)




#分解(decomposing) 可以用來把時序資料中的趨勢和週期性資料都分離出來:
from statsmodels.tsa.seasonal import seasonal_decompose
def decompose(timeseries):
    
    # 返回包含三個部分 trend（趨勢部分） ， seasonal（季節性部分） 和residual (殘留部分)
    decomposition = seasonal_decompose(timeseries)
    
    trend = decomposition.trend
    seasonal = decomposition.seasonal
    residual = decomposition.resid
    
    plt.subplot(411)
    plt.plot(ts_log, label='Original')
    plt.legend(loc='best')
    plt.subplot(412)
    plt.plot(trend, label='Trend')
    plt.legend(loc='best')
    plt.subplot(413)
    plt.plot(seasonal,label='Seasonality')
    plt.legend(loc='best')
    plt.subplot(414)
    plt.plot(residual, label='Residuals')
    plt.legend(loc='best')
    plt.tight_layout()
    
    return trend , seasonal, residual

#消除了trend 和seasonal之後，只對residual部分作為想要的時序資料進行處理
trend , seasonal, residual = decompose(ts_log)
residual.dropna(inplace=True)
test_stationarity(residual)

# =============================================================================
# 對時序資料進行預測# 
# =============================================================================
#ACF and PACF plots:
from statsmodels.tsa.stattools import acf, pacf
lag_acf = acf(ts_log_diff, nlags=20)
lag_pacf = pacf(ts_log_diff, nlags=20, method='ols')
#Plot ACF: 
plt.subplot(121) 
plt.plot(lag_acf)
plt.axhline(y=0,linestyle='--',color='gray')
plt.axhline(y=-1.96/np.sqrt(len(ts_log_diff)),linestyle='--',color='gray')
plt.axhline(y=1.96/np.sqrt(len(ts_log_diff)),linestyle='--',color='gray')
plt.title('Autocorrelation Function')

#Plot PACF:
plt.subplot(122)
plt.plot(lag_pacf)
plt.axhline(y=0,linestyle='--',color='gray')
plt.axhline(y=-1.96/np.sqrt(len(ts_log_diff)),linestyle='--',color='gray')
plt.axhline(y=1.96/np.sqrt(len(ts_log_diff)),linestyle='--',color='gray')
plt.title('Partial Autocorrelation Function')
plt.tight_layout()





from statsmodels.tsa.arima_model import ARIMA
model = ARIMA(ts_log, order=(2, 1, 0))  
results_AR = model.fit(disp=-1)  
plt.plot(ts_log_diff)
plt.plot(results_AR.fittedvalues, color='red')
plt.title('RSS: %.4f'% sum((results_AR.fittedvalues-ts_log_diff)**2))

model = ARIMA(ts_log, order=(0, 1, 2))  
results_MA = model.fit(disp=-1)  
plt.plot(ts_log_diff)
plt.plot(results_MA.fittedvalues, color='red')
plt.title('RSS: %.4f'% sum((results_MA.fittedvalues-ts_log_diff)**2))



model = ARIMA(ts_log, order=(2, 1, 2))  
results_ARIMA = model.fit(disp=-1)  
plt.plot(ts_log_diff)
plt.plot(results_ARIMA.fittedvalues, color='red')
plt.title('RSS: %.4f'% sum((results_ARIMA.fittedvalues-ts_log_diff)**2))

# =============================================================================
#  預測
#  =============================================================================


#ARIMA擬合的其實是一階差分ts_log_diff，predictions_ARIMA_diff[i]是第i個月與i-1個月的ts_log的差值。
#由於差分化有一階滯後，所以第一個月的資料是空的，
predictions_ARIMA_diff = pd.Series(results_ARIMA.fittedvalues, copy=True)
print (predictions_ARIMA_diff.head())
#累加現有的diff，得到每個值與第一個月的差分（同log底的情況下）。
#即predictions_ARIMA_diff_cumsum[i] 是第i個月與第1個月的ts_log的差值。
predictions_ARIMA_diff_cumsum = predictions_ARIMA_diff.cumsum()
#先ts_log_diff => ts_log=>ts_log => ts 
#先以ts_log的第一個值作為基數，複製給所有值，然後每個時刻的值累加與第一個月對應的差值(這樣就解決了，第一個月diff資料為空的問題了)
#然後得到了predictions_ARIMA_log => predictions_ARIMA
predictions_ARIMA_log = pd.Series(ts_log.ix[0], index=ts_log.index)
predictions_ARIMA_log = predictions_ARIMA_log.add(predictions_ARIMA_diff_cumsum,fill_value=0)
predictions_ARIMA = np.exp(predictions_ARIMA_log)
plt.figure()
plt.plot(ts)
plt.plot(predictions_ARIMA)
plt.title('RMSE: %.4f'% np.sqrt(sum((predictions_ARIMA-ts)**2)/len(ts)))

執行結果如下：

    Month  #Passengers
0  1949-01          112
1  1949-02          118
2  1949-03          132
3  1949-04          129
4  1949-05          121

 Data types:
Month          object
#Passengers     int64
dtype: object
            #Passengers
Month                  
1949-01-01          112
1949-02-01          118
1949-03-01          132
1949-04-01          129
1949-05-01          121
DatetimeIndex(['1949-01-01', '1949-02-01', '1949-03-01', '1949-04-01',
               '1949-05-01', '1949-06-01', '1949-07-01', '1949-08-01',
               '1949-09-01', '1949-10-01',
               ...
               '1960-03-01', '1960-04-01', '1960-05-01', '1960-06-01',
               '1960-07-01', '1960-08-01', '1960-09-01', '1960-10-01',
               '1960-11-01', '1960-12-01'],
              dtype='datetime64[ns]', name='Month', length=144, freq=None)

Results of Dickey-Fuller Test:
Test Statistic                   0.815369
p-value                          0.991880
#Lags Used                      13.000000
Number of Observations Used    130.000000
Critical value (1%)             -3.481682
Critical value (5%)             -2.884042
Critical value (10%)            -2.578770
dtype: float64

Results of Dickey-Fuller Test:
Test Statistic                  -3.162908
p-value                          0.022235
#Lags Used                      13.000000
Number of Observations Used    119.000000
Critical value (1%)             -3.486535
Critical value (5%)             -2.886151
Critical value (10%)            -2.579896
dtype: float64

Results of Dickey-Fuller Test:
Test Statistic                  -3.601262
p-value                          0.005737
#Lags Used                      13.000000
Number of Observations Used    130.000000
Critical value (1%)             -3.481682
Critical value (5%)             -2.884042
Critical value (10%)            -2.578770
dtype: float64

Results of Dickey-Fuller Test:
Test Statistic                  -2.717131
p-value                          0.071121
#Lags Used                      14.000000
Number of Observations Used    128.000000
Critical value (1%)             -3.482501
Critical value (5%)             -2.884398
Critical value (10%)            -2.578960
dtype: float64

Results of Dickey-Fuller Test:
Test Statistic                -6.332387e+00
p-value                        2.885059e-08
#Lags Used                     9.000000e+00
Number of Observations Used    1.220000e+02
Critical value (1%)           -3.485122e+00
Critical value (5%)           -2.885538e+00
Critical value (10%)          -2.579569e+00
dtype: float64

ARIMA 時間序列預測

ARIMA 時間序列預測（學習資料及程式碼均從網上獲取。）資料記錄AirPassengers.csv： Month,#Passengers 1949-01,112 1949-02,118 1949-03,132 1949-04,129 1949-05,121 1949

使用Python為時間序列預測建立ARIMA模型

如何在Python中為時間序列預測建立ARIMA模型 ARIMA模型是一種流行且廣泛使用的

ARIMA時間序列分析-----Python例項（一週銷售營業額預測）

以ARIMA模型為例介紹時間序列演算法在python中是如何實現的，一下是應用Python語言建模步驟： -- coding: utf-8 -- “”” Created on Mon Apr 2 16:45:36 2018 @author: hou

7天微課程day6——用ARIMA模型進行時間序列預測

宣告：本文是系列課程的第6課本文是對機器學習網站課程的翻譯尊重原作者，尊重知識分享用ARIMA模型進行時間序列預測 ARIMA(AutoRegressive Intergrated Moving Average)是一個非常非常流行的時間序列

用python做時間序列預測九：ARIMA模型簡介

> 本篇介紹時間序列預測常用的ARIMA模型，通過了解本篇內容，將可以使用ARIMA預測一個時間序列。 ### 什麼是ARIMA？ >- ARIMA是'Auto Regressive Integrated Moving Average'的簡稱。 >- ARIMA是一種基於時間序列歷史值和歷

86、使用Tensorflow實現，LSTM的時間序列預測，預測正弦函數

ati pre win real testing could sqrt sha ima ‘‘‘ Created on 2017年5月21日 @author: weizhen ‘‘‘ # 以下程序為預測離散化之後的sin函數 import numpy as np impo

keras-anomaly-detection 代碼分析——本質上就是SAE、LSTM時間序列預測

encoding urn odin forward mean code -a reat ati keras-anomaly-detection Anomaly detection implemented in Keras The source codes of the re

時間序列預測——深度好文

原文地址：https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-9-time-series-analysis-in-python-a270cb05e0b3 Open Machine Learnin

Tensorflow LSTM時間序列預測的嘗試

一、網上的資源網上有不少用LSTM來預測時間序列的資源，如下面：深度學習（08）_RNN-LSTM迴圈神經網路-03-Tensorflow進階實現 http://blog.csdn.net/u013082989/article/details/73693392 Applying

時間序列預測演算法總結

時間序列演算法 time series data mining 主要包括decompose（分析資料的各個成分，例如趨勢，週期性），prediction（預測未來的值），classification（對有序資料序列的feature提取與分類），clustering（相似數列聚類）等。時間序

LSTM時間序列預測學習

一、檔案準備工作下載好的例程式二、開始執行 1、在程式所在目錄中（chapter_15）開啟終端輸入下面的指令執行 python train_lstm.py 此時出現了報錯提

基於深度學習時間序列預測系統專案需求分析心得

專案第一次迭代已經進入了尾聲，在我們小組剛確定這個專案的時候，花了兩個周的時間來確定專案的需求。以下是我們在進行需求分析的一些心得。需求分析過程：　　(1) 小組內部進行討論：在進行團隊專案開發之初，我們在需求分析還有資料庫設計上花了很多時間，首先是進行多次需求分析的團隊會議，小組人員

Keras LSTM 時間序列預測

Keras LSTM 時間序列預測 international-airline-passengers.csv資料記錄： time,passengers "1949-01",112 "1949-02",118 "1949-03",132 "1949-04",129

基於Keras的LSTM多變數時間序列預測（學習筆記）

本文翻譯自Jason Brownlee的部落格https://machinelearningmastery.com/multivariate-time-series-forecasting-lstms-keras/ 本部落格主要參考以下中文版部落格 https://blog.csdn.net/qq_280

長短期記憶（LSTM）系列_1.1、迴歸神經網路在時間序列預測中的介紹和應用

摘要，通過本文你可以學到：傳統的時間序列預測方法側重於具有線性關係的單變數資料以及固定和手動診斷的時間依賴性。神經網路增加了學習可能的噪聲和非線性關係的能力，其中任意定義但固定數量的輸入和輸出支援多變數和多步預測。遞迴神經網路增加了有序觀察的顯式處理和從上下文學習時間依賴

基於長短期記憶神經網路LSTM的多步長時間序列預測

基於長短期記憶神經網路LSTM的多步長多變數時間序列預測長短時記憶網路(LSTM)是一種能夠學習和預測長序列的遞迴神經網路。LSTMs除了學習長序列外，還可以學習一次多步預測，這對於時間序列的預測非

基於Keras的LSTM多變數時間序列預測（北京PM2.5資料集pollution.csv）

基於Keras的LSTM多變數時間序列預測　　傳統的線性模型難以解決多變數或多輸入問題

將時間序列預測問題轉換為python中的監督學習問題

像深度學習這樣的機器學習方法可以用於時間序列預測。在機器學習方法可以被使用前，時間序列預測問題必須重新構建成監督學習問題，從一個單純的序列變成一對序列輸入和輸出。在這個教程中，你將瞭解如何將單變數和多變數時間序列預測問題轉換為與機器學習演算法一起使用的監督學習問題

8.4.2 時間序列預測——使用TFLearn自定義模型——程式碼執行錯誤及解決方法

《TensorFlow》：實戰Google深度學習框架中第八章的——8.4.2 時間序列預測——使用TFLearn自定義模型下的原始碼執行報錯：原因分析： score=metrics.accuracy_score(y_test,y_predicted)該句程式碼中y_predi

LSTM時間序列預測及網路層搭建

1. LSTM預測未來一年某航空公司的客運流量這裡的問題是：給你一個數據集，只有一列資料，這是一個關於時間序列的資料，從這個時間序列中預測未來一年某航空公司的客運流量。資料形式： time passengers 0 1949-01

ARIMA 時間序列預測

相關推薦