How to Visualize Time Series Residual Forecast Errors with Python

Forecast errors on time series regression problems are called residuals or residual errors.

Careful exploration of residual errors on your time series prediction problem can tell you a lot about your forecast model and even suggest improvements.

In this tutorial, you will discover how to visualize residual errors from time series forecasts.

After completing this tutorial, you will know:

How to create and review line plots of residual errors over time.
How to review summary statistics and plots of the distribution of residual plots.
How to explore the correlation structure of residual errors.

Let’s get started.

Residual Forecast Errors

Forecast errors on a time series forecasting problem are called residual errors or residuals.

A residual error is calculated as the expected outcome minus the forecast, for example:

residual error = expected - forecast

1	residual error = expected - forecast

Or, more succinctly and using standard terms as:

e = y - yhat

1	e = y - yhat

We often stop there and summarize the skill of a model as a summary of this error.

Instead, we can collect these individual residual errors across all forecasts and use them to better understand the forecast model.

Generally, when exploring residual errors we are looking for patterns or structure. A sign of a pattern suggests that the errors are not random.

We expect the residual errors to be random, because it means that the model has captured all of the structure and the only error left is the random fluctuations in the time series that cannot be modeled.

A sign of a pattern or structure suggests that there is more information that a model could capture and use to make better predictions.

Before we start exploring the different ways to look for patterns in residual errors, we need context. In the next section, we will look at a dataset and a simple forecast method that we will use to generate residual errors to explore in this tutorial.

Stop learning Time Series Forecasting the slow way!

Take my free 7-day email course and discover data prep, modeling and more (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Daily Female Births Dataset

This dataset describes the number of daily female births in California in 1959.

The units are a count and there are 365 observations. The source of the dataset is credited to Newton, 1988.

Download the dataset and place it in your current working directory with the filename “daily-total-female-births.csv“.

Below is an example of loading the Daily Female Births dataset from CSV.

from pandas import Series
from matplotlib import pyplot
series = Series.from_csv('daily-total-female-births.csv', header=0)
print(series.head())
series.plot()
pyplot.show()

123456

from pandas import Seriesfrom matplotlib import pyplotseries=Series.from_csv('daily-total-female-births.csv',header=0)print(series.head())series.plot()pyplot.show()

Running the example prints the first 5 rows of the loaded file.

Date
1959-01-01 35
1959-01-02 32
1959-01-03 30
1959-01-04 31
1959-01-05 44
Name: Births, dtype: int64

1234567

Date1959-01-01 351959-01-02 321959-01-03 301959-01-04 311959-01-05 44Name: Births, dtype: int64

The dataset is also shown in a line plot of observations over time.

Daily Female Births Dataset

Persistence Forecast Model

The simplest forecast that we can make is to forecast that what happened in the previous time step will be the same as what will happen in the next time step.

This is called the “naive forecast” or the persistence forecast model.

We can implement the persistence model in Python.

After the dataset is loaded, it is phrased as a supervised learning problem. A lagged version of the dataset is created where the prior time step (t-1) is used as the input variable and the next time step (t+1) is taken as the output variable.

# create lagged dataset
values = DataFrame(series.values)
dataframe = concat([values.shift(1), values], axis=1)
dataframe.columns = ['t-1', 't+1']

1234	# create lagged datasetvalues=DataFrame(series.values)dataframe=concat([values.shift(1),values],axis=1)dataframe.columns=['t-1','t+1']

Next, the dataset is split into training and test sets. A total of 66% of the data is kept for training and the remaining 34% is held for the test set. No training is required for the persistence model; this is just a standard test harness approach.

Once split, the train and test sets are separated into their input and output components.

# split into train and test sets
X = dataframe.values
train_size = int(len(X) * 0.66)
train, test = X[1:train_size], X[train_size:]
train_X, train_y = train[:,0], train[:,1]
test_X, test_y = test[:,0], test[:,1]

123456

# split into train and test setsX=dataframe.valuestrain_size=int(len(X)*0.66)train,test=X[1:train_size],X[train_size:]train_X,train_y=train[:,0],train[:,1]test_X,test_y=test[:,0],test[:,1]

The persistence model is applied by predicting the output value (y) as a copy of the input value (x).

# persistence model
predictions = [x for x in test_X]

12	# persistence modelpredictions=[xforxintest_X]

The residual errors are then calculated as the difference between the expected outcome (test_y) and the prediction (predictions).

# calculate residuals
residuals = [test_y[i]-predictions[i] for i in range(len(predictions))]

12	# calculate residualsresiduals=[test_y[i]-predictions[i]foriinrange(len(predictions))]

The example puts this all together and gives us a set of residual forecast errors that we can explore in this tutorial.

from pandas import Series
from pandas import DataFrame
from pandas import concat
series = Series.from_csv('daily-total-female-births.csv', header=0)
# create lagged dataset
values = DataFrame(series.values)
dataframe = concat([values.shift(1), values], axis=1)
dataframe.columns = ['t-1', 't+1']
# split into train and test sets
X = dataframe.values
train_size = int(len(X) * 0.66)
train, test = X[1:train_size], X[train_size:]
train_X, train_y = train[:,0], train[:,1]
test_X, test_y = test[:,0], test[:,1]
# persistence model
predictions = [x for x in test_X]
# calculate residuals
residuals = [test_y[i]-predictions[i] for i in range(len(predictions))]
residuals = DataFrame(residuals)
print(residuals.head())

1234567891011121314151617181920

from pandas import Seriesfrom pandas import DataFramefrom pandas import concatseries=Series.from_csv('daily-total-female-births.csv',header=0)# create lagged datasetvalues=DataFrame(series.values)dataframe=concat([values.shift(1),values],axis=1)dataframe.columns=['t-1','t+1']# split into train and test setsX=dataframe.valuestrain_size=int(len(X)*0.66)train,test=X[1:train_size],X[train_size:]train_X,train_y=train[:,0],train[:,1]test_X,test_y=test[:,0],test[:,1]# persistence modelpredictions=[xforxintest_X]# calculate residualsresiduals=[test_y[i]-predictions[i]foriinrange(len(predictions))]residuals=DataFrame(residuals)print(residuals.head())

Running the example prints the first 5 rows of the forecast residuals.

12345

0 9.01 -10.02 3.03 -6.04 30.0

Residual Line Plot

The first plot is to look at the residual forecast errors over time as a line plot.

We would expect the plot to be random around the value of 0 and not show any trend or cyclic structure.

The array of residual errors can be wrapped in a Pandas DataFrame and plotted directly. The code below provides an example.

from pandas import Series
from pandas import DataFrame
from pandas import concat
from matplotlib import pyplot
series = Series.from_csv('daily-total-female-births.csv', header=0)
# create lagged dataset
values = DataFrame(series.values)
dataframe = concat([values.shift(1), values], axis=1)
dataframe.columns = ['t-1', 't+1']
# split into train and test sets
X = dataframe.values
train_size = int(len(X) * 0.66)
train, test = X[1:train_size], X[train_size:]
train_X, train_y = train[:,0], train[:,1]
test_X, test_y = test[:,0], test[:,1]
# persistence model
predictions = [x for x in test_X]
# calculate residuals
residuals = [test_y[i]-predictions[i] for i in range(len(predictions))]
residuals = DataFrame(residuals)
# plot residuals
residuals.plot()
pyplot.show()

1234567891011121314151617181920212223

from pandas import Seriesfrom pandas import DataFramefrom pandas import concatfrom matplotlib import pyplotseries=Series.from_csv('daily-total-female-births.csv',header=0)# create lagged datasetvalues=DataFrame(series.values)dataframe=concat([values.shift(1),values],axis=1)dataframe.columns=['t-1','t+1']# split into train and test setsX=dataframe.valuestrain_size=int(len(X)*0.66)train,test=X[1:train_size],X[train_size:]train_X,train_y=train[:,0],train[:,1]test_X,test_y=test[:,0],test[:,1]# persistence modelpredictions=[xforxintest_X]# calculate residualsresiduals=[test_y[i]-predictions[i]foriinrange(len(predictions))]residuals=DataFrame(residuals)# plot residualsresiduals.plot()pyplot.show()

Running the example shows a seemingly random plot of the residual time series.

If we did see trend, seasonal or cyclic structure, we could go back to our model and attempt to capture those elements directly.

Line Plot of Residual Errors for the Daily Female Births Dataset

Next, we look at summary statistics that we can use to see how the errors are spread around zero.

Residual Summary Statistics

We can calculate summary statistics on the residual errors.

Primarily, we are interested in the mean value of the residual errors. A value close to zero suggests no bias in the forecasts, whereas positive and negative values suggest a positive or negative bias in the forecasts made.

It is useful to know about a bias in the forecasts as it can be directly corrected in forecasts prior to their use or evaluation.

Below is an example of calculating summary statistics of the distribution of residual errors. This includes the mean and standard deviation of the distribution, as well as percentiles and the minimum and maximum errors observed.

from pandas import Series
from pandas import DataFrame
from pandas import concat
from matplotlib import pyplot
series = Series.from_csv('daily-total-female-births.csv', header=0)
# create lagged dataset
values = DataFrame(series.values)
dataframe = concat([values.shift(1), values], axis=1)
dataframe.columns = ['t-1', 't+1']
# split into train and test sets
X = dataframe.values
train_size = int(len(X) * 0.66)
train, test = X[1:train_size], X[train_size:]
train_X, train_y = train[:,0], train[:,1]
test_X, test_y = test[:,0], test[:,1]
# persistence model
predictions = [x for x in test_X]
# calculate residuals
residuals = [test_y[i]-predictions[i] for i in range(len(predictions))]
residuals = DataFrame(residuals)
# summary statistics
print(residuals.describe())

12345678910111213141516171819202122

How to Visualize Time Series Residual Forecast Errors with Python

Residual Forecast Errors

Stop learning Time Series Forecasting the slow way!

Daily Female Births Dataset

Persistence Forecast Model

Residual Line Plot

Residual Summary Statistics

How to Visualize Time Series Residual Forecast Errors with Python

How to Scale Machine Learning Data From Scratch With Python

How to Model Residual Errors to Correct Time Series Forecasts with Python

How to Use Power Transforms for Time Series Forecast Data with Python

How to visualize decision tree

Time Series Forecast Study with Python: Monthly Sales of French Champagne

How to Visualize Your Recurrent Neural Network with Attention in Keras

How to use Toyota simulated card + P001 Programmer with Obdstar X300 DP PLUS

How To Detect CSS Transition Start And End With Javascript

How to make your iOS apps more secure with SSL pinning

How to make the impossible possible in CSS with a little creativity

How to train Keras model x20 times faster with TPU for free

How to train your Neural Networks in parallel with Keras and Apache Spark

How to black box test a Go app with RSpec

How to write a super fast link shortener with Elixir, Phoenix, and Mnesia

How to Automatically Generate Textual Descriptions for Photographs with Deep Learning

How to Troubleshoot High or Full Disk Usage with Amazon Redshift

How to Easily Deploy an Amazon EKS Cluster with Pulumi

How to Break a Monolith Application into Microservices with Amazon Elastic Container Service, Docker, and Amazon EC2

How To Make A Swipeable Table View Cell With Actions – Without Going Nuts With Scroll Views

How to Visualize Time Series Residual Forecast Errors with Python

Residual Forecast Errors

Stop learning Time Series Forecasting the slow way!

Daily Female Births Dataset

Persistence Forecast Model

Residual Line Plot

Residual Summary Statistics

相關推薦