1. 程式人生 > >7 Time Series Datasets for Machine Learning

7 Time Series Datasets for Machine Learning

Machine learning can be applied to time series datasets.

These are problems where a numeric or categorical value must be predicted, but the rows of data are ordered by time.

A problem when getting started in time series forecasting with machine learning is finding good quality standard datasets on which to practice.

In this post, you will discover 8 standard time series datasets that you can use to get started and practice time series forecasting with machine learning.

After reading this post, you will know:

  • 4 univariate time series datasets.
  • 3 multivariate time series datasets.
  • Websites that you can use to search and download more datasets.

Let’s get started.

Univariate Time Series Datasets

Time series datasets that only have one variable are called univariate datasets.

These datasets are a great place to get started because:

  • They are so simple and easy to understand.
  • You can plot them easily in excel or your favorite plotting tool.
  • You can easily plot the predictions compared to the expected results.
  • You can quickly try and evaluate a suite of traditional and newer methods.

The website Data Market provides access to a large number of time series datasets. Specifically, the Time Series Data Library created by Rob Hyndman, Professor of Statistics at Monash University, Australia

Below are 4 univariate time series datasets that you can download for free from Data Market from a range of fields such as Sales, Meteorology, Physics and Demography.

Stop learning Time Series Forecasting the slow way!

Take my free 7-day email course and discover data prep, modeling and more (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Shampoo Sales Dataset

This dataset describes the monthly number of sales of shampoo over a 3 year period.

The units are a sales count and there are 36 observations. The original dataset is credited to Makridakis, Wheelwright and Hyndman (1998).

Below is a sample of the first 5 rows of data including the header row.

123456 "Month","Sales of shampoo over a three year period""1-01",266.0"1-02",145.9"1-03",183.1"1-04",119.3"1-05",180.3

Below is a plot of the entire dataset taken from Data Market.

Shampoo Sales Dataset

Shampoo Sales Dataset

The dataset shows an increasing trend and possibly some seasonal component.

Minimum Daily Temperatures Dataset

This dataset describes the minimum daily temperatures over 10 years (1981-1990) in the city Melbourne, Australia.

The units are in degrees Celsius and there are 3650 observations. The source of the data is credited as the Australian Bureau of Meteorology.

Below is a sample of the first 5 rows of data including the header row.

123456 "Date","Daily minimum temperatures in Melbourne, Australia, 1981-1990""1981-01-01",20.7"1981-01-02",17.9"1981-01-03",18.8"1981-01-04",14.6"1981-01-05",15.8

Below is a plot of the entire dataset taken from Data Market.

Minimum Daily Temperatures

Minimum Daily Temperatures

The dataset shows a strong seasonality component and has a nice fine grained detail to work with.

Monthly Sunspot Dataset

This dataset describes a monthly count of the number of observed sunspots for just over 230 years (1749-1983).

The units are a count and there are 2,820 observations. The source of the dataset is credited to Andrews & Herzberg (1985).

Below is a sample of the first 5 rows of data including the header row.

123456 "Month","Zuerich monthly sunspot numbers 1749-1983""1749-01",58.0"1749-02",62.6"1749-03",70.0"1749-04",55.7"1749-05",85.0

Below is a plot of the entire dataset taken from Data Market.

Monthly Sun Spot Dataset

Monthly Sun Spot Dataset

The dataset shows seasonality with large differences between seasons.

Daily Female Births Dataset

This dataset describes the number of daily female births in California in 1959.

The units are a count and there are 365 observations. The source of the dataset is credited to Newton (1988).

Below is a sample of the first 5 rows of data including the header row.

123456 "Date","Daily total female births in California, 1959""1959-01-01",35"1959-01-02",32"1959-01-03",30"1959-01-04",31"1959-01-05",44

Below is a plot of the entire dataset taken from Data Market.

Daily Female Births Dataset

Daily Female Births Dataset

Multivariate Time Series Datasets

Multivariate datasets are generally more challenging and are the sweet spot for machine learning methods.

A great source of multivariate time series data is the UCI Machine Learning Repository. At the time of writing, there are

At the time of writing, there are 63 time series datasets that you can download for free and work with.

Below is a selection of 3 recommended multivariate time series datasets from Meteorology, Medicine and Monitoring domains.

EEG Eye State Dataset

This dataset describes EEG data for an individual and whether their eyes were open or closed. The objective of the problem is to predict whether eyes are open or closed given EEG data alone.

The objective of the problem is to predict whether eyes are open or closed given EEG data alone.

This is a classification predictive modeling problems and there are a total of 14,980 observations and 15 input variables. The class value of ‘1’ indicates the eye-closed and ‘0’ the eye-open state. Data is ordered by time and observations were recorded over a period of 117 seconds.

Data is ordered by time and observations were recorded over a period of 117 seconds.

Below is a sample of the first 5 rows with no header row.

12345 4329.23,4009.23,4289.23,4148.21,4350.26,4586.15,4096.92,4641.03,4222.05,4238.46,4211.28,4280.51,4635.9,4393.85,04324.62,4004.62,4293.85,4148.72,4342.05,4586.67,4097.44,4638.97,4210.77,4226.67,4207.69,4279.49,4632.82,4384.1,04327.69,4006.67,4295.38,4156.41,4336.92,4583.59,4096.92,4630.26,4207.69,4222.05,4206.67,4282.05,4628.72,4389.23,04328.72,4011.79,4296.41,4155.9,4343.59,4582.56,4097.44,4630.77,4217.44,4235.38,4210.77,4287.69,4632.31,4396.41,04326.15,4011.79,4292.31,4151.28,4347.69,4586.67,4095.9,4627.69,4210.77,4244.1,4212.82,4288.21,4632.82,4398.46,0

Occupancy Detection Dataset

This dataset describes measurements of a room and the objective is to predict whether or not the room is occupied.

There are 20,560 one-minute observations taken over the period of a few weeks. This is a classification prediction problem. There are 7 attributes including various light and climate properties of the room.

The source for the data is credited to Luis Candanedo from UMONS.

Below is a sample of the first 5 rows of data including the header row.

1234567 "date","Temperature","Humidity","Light","CO2","HumidityRatio","Occupancy""1","2015-02-04 17:51:00",23.18,27.272,426,721.25,0.00479298817650529,1"2","2015-02-04 17:51:59",23.15,27.2675,429.5,714,0.00478344094931065,1"3","2015-02-04 17:53:00",23.15,27.245,426,713.5,0.00477946352442199,1"4","2015-02-04 17:54:00",23.15,27.2,426,708.25,0.00477150882608175,1"5","2015-02-04 17:55:00",23.1,27.2,426,704.5,0.00475699293331518,1"6","2015-02-04 17:55:59",23.1,27.2,419,701,0.00475699293331518,1

The data is provided in 3 files that suggest the splits that may be used for training and testing a model.

Ozone Level Detection Dataset

This dataset describes 6 years of ground ozone concentration observations and the objective is to predict whether it is an “ozone day” or not.

The dataset contains 2,536 observations and 73 attributes. This is a classification prediction problem and the final attribute indicates the class value as “1” for an ozone day and “0” for a normal day.

Two versions of the data are provided, eight-hour peak set and one-hour peak set. I would suggest using the one hour peak set for now.

Below is a sample of the first 5 rows with no header row.

123456 1/1/1998,0.8,1.8,2.4,2.1,2,2.1,1.5,1.7,1.9,2.3,3.7,5.5,5.1,5.4,5.4,4.7,4.3,3.5,3.5,2.9,3.2,3.2,2.8,2.6,5.5,3.1,5.2,6.1,6.1,6.1,6.1,5.6,5.2,5.4,7.2,10.6,14.5,17.2,18.3,18.9,19.1,18.9,18.3,17.3,16.8,16.1,15.4,14.9,14.8,15,19.1,12.5,6.7,0.11,3.83,0.14,1612,-2.3,0.3,7.18,0.12,3178.5,-15.5,0.15,10.67,-1.56,5795,-12.1,17.9,10330,-55,0,0.1/2/1998,2.8,3.2,3.3,2.7,3.3,3.2,2.9,2.8,3.1,3.4,4.2,4.5,4.5,4.3,5.5,5.1,3.8,3,2.6,3,2.2,2.3,2.5,2.8,5.5,3.4,15.1,15.3,15.6,15.6,15.9,16.2,16.2,16.2,16.6,17.8,19.4,20.6,21.2,21.8,22.4,22.1,20.8,19.1,18.1,17.2,16.5,16.1,16,16.2,22.4,17.8,9,0.25,-0.41,9.53,1594.5,-2.2,0.96,8.24,7.3,3172,-14.5,0.48,8.39,3.84,5805,14.05,29,10275,-55,0,0.1/3/1998,2.9,2.8,2.6,2.1,2.2,2.5,2.5,2.7,2.2,2.5,3.1,4,4.4,4.6,5.6,5.4,5.2,4.4,3.5,2.7,2.9,3.9,4.1,4.6,5.6,3.5,16.6,16.7,16.7,16.8,16.8,16.8,16.9,16.9,17.1,17.6,19.1,21.3,21.8,22,22.1,22.2,21.3,19.8,18.6,18,18,18.2,18.3,18.4,22.2,18.7,9,0.56,0.89,10.17,1568.5,0.9,0.54,3.8,4.42,3160,-15.9,0.6,6.94,9.8,5790,17.9,41.3,10235,-40,0,0.1/4/1998,4.7,3.8,3.7,3.8,2.9,3.1,2.8,2.5,2.4,3.1,3.3,3.1,2.3,2.1,2.2,3.8,2.8,2.4,1.9,3.2,4.1,3.9,4.5,4.3,4.7,3.2,18.3,18.2,18.3,18.4,18.6,18.6,18.5,18.7,18.6,18.8,19,19,19.3,19.4,19.6,19.2,18.9,18.8,18.6,18.5,18.3,18.5,18.8,18.9,19.6,18.7,9.9,0.89,-0.34,8.58,1546.5,3,0.77,4.17,8.11,3145.5,-16.8,0.49,8.73,10.54,5775,31.15,51.7,10195,-40,2.08,0.1/5/1998,2.6,2.1,1.6,1.4,0.9,1.5,1.2,1.4,1.3,1.4,2.2,2,3,3,3.1,3.1,2.7,3,2.4,2.8,2.5,2.5,3.7,3.4,3.7,2.3,18.8,18.6,18.5,18.5,18.6,18.9,19.2,19.4,19.8,20.5,21.1,21.9,23.8,25.1,25.8,26,25.6,24.2,22.9,21.6,20,19.5,19.1,19.1,26,21.1,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,0.58,0.1/6/1998,3.1,3.5,3.3,2.5,1.6,1.7,1.6,1.6,2.3,1.8,2.5,3.9,3.4,2.7,3.4,2.5,2.2,4.4,4.3,3.2,6.2,6.8,5.1,4,6.8,3.2,18.9,19.5,19.6,19.5,19.5,19.5,19.4,19.2,19.1,19.5,19.6,18.6,18.6,18.9,19.2,19.3,19.2,18.8,17.6,16.9,15.6,15.4,15.9,15.8,19.6,18.5,14.4,0.68,1.52,8.62,1499.5,4.3,0.61,9.04,10.81,3111,-11.8,0.09,11.98,11.28,5770,27.95,46.25,10120,?,5.84,0.

Summary

In this post, you discovered a suite of standard time series forecast datasets that you can use to get started and practice time series forecasting with machine learning methods.

Specifically, you learned about:

  • 4 univariate time series forecasting datasets.
  • 3 multivariate time series forecasting datasets.
  • Two websites where you can download many more datasets.

Did you use one of the above datasets in your own project?
Share your findings in the comments below.

Want to Develop Time Series Forecasts with Python?

Introduction to Time Series Forecasting With Python

Develop Your Own Forecasts in Minutes

...with just a few lines of python code

It covers self-study tutorials and end-to-end projects on topics like:
Loading data, visualization, modeling, algorithm tuning, and much more...

Finally Bring Time Series Forecasting to
Your Own Projects

Skip the Academics. Just Results.

相關推薦

7 Time Series Datasets for Machine Learning

Tweet Share Share Google Plus Machine learning can be applied to time series datasets. These are

step Time Series Forecasting with Machine Learning for Household Electricity Consumption

Given the rise of smart electricity meters and the wide adoption of electricity generation technology like solar panels, there is a wealth of electricity

The 50 Best Public Datasets for Machine Learning

The 50 Best Public Datasets for Machine LearningWhat are some open datasets for machine learning? After scrapping the web for hours after hours, we have cr

Top 10 Open Image Datasets for Machine Learning Research

This article would succinctly describe the best ten image datasets used for certain fundamental computer vision problems such as classification, detecti

斯坦福大學公開課機器學習:machine learning system design | data for machine learning(數據量很大時,學習算法表現比較好的原理)

ali 很多 好的 info 可能 斯坦福大學公開課 數據 div http 下圖為四種不同算法應用在不同大小數據量時的表現,可以看出,隨著數據量的增大,算法的表現趨於接近。即不管多麽糟糕的算法,數據量非常大的時候,算法表現也可以很好。 數據量很大時,學習算法表現比

Statistical Methods for Machine Learning

AS n-2 cal 元素 n) pan size AC 情況 機器學習中的統計學方法。 統計學是機器學習的一個支柱。 原始觀察僅僅是數據, 但它們不是信息或知識。數據引發問題, 例如: 什麽是最常見的或預期的觀察? 觀察的限制是什麽? 數據是什麽樣子的?

《C4.5: Programs for Machine Learning》chaper4實驗結果重現

使用自帶的vote資料集: 實驗結果如下: 剪枝前: physician fee freeze = n: | adoption of the budget resolution = y: democrat (151.0) | adoption of the budget resolution

the resource for machine learning

Questions and Answers What's matrix dot product in Deep Learning? Deep Neural Network with Matrices https://matrices.io/deep-neural-network-from-scrat

[Infographic] The Best Tools for Machine Learning Gengo AI

Machine learning projects can range from small datasets and standard algorithms, to much larger projects that use neural networks engines with massive data

Facebook's PyTorch plans to light the way to speedy workflows for Machine Learning • DEVCLASS

Facebook's development department has finished a first release candidate for v1 of its PyTorch project – just in time for the first conference dedicated to

Essential libraries for Machine Learning in Python

Python is often the language of choice for developers who need to apply statistical techniques or data analysis in their work. It is also used by data scie

Why Data Normalization is necessary for Machine Learning models

Why Data Normalization is necessary for Machine Learning modelsNormalization is a technique often applied as part of data preparation for machine learning.

7 cloud services to ease machine learning

One of the last computing chores to be sucked into the cloud is data analysis. Perhaps it's because scientists are naturally good at programming and so the

NXP Owns the Stage for Machine Learning in Edge Devices

SAN JOSE, Calif. and BARCELONA, Spain, Oct. 16, 2018 (GLOBE NEWSWIRE) -- (ARMTECHCON and IoT World Congress Barcelona) - Mathematical advances that are dri

NXP's New Development Platform for Machine Learning in the IoT

NXP Semiconductors has launched a new machine learning toolkit. Called "eIQ", it's a software development platform that supports popular neural network fra

Free Online Course: Neural Networks for Machine Learning from Coursera Class Central

I honestly can't understand the multiple 5 star reviews presented on this site about the course. I'm giving it a 1 star which is a bit harsh I know but I'm

Marginally Interesting: Slides for Machine Learning on Streams

Tweet Yesterday I gave a talk at the Big Data Beers meetup in Berlin on

Using Amazon’s Mechanical Turk for Machine Learning Data

How to build a model from Mechanical Turk resultsAmazon Mechanical Turk will notify you when your results are ready and you will finally have a labelled da

Abdul Latif Jameel Clinic for Machine Learning in Health at MIT aims to revolutionize disease prevention, detection, and treatme

Today, MIT and Community Jameel, the social enterprise organization founded and chaired by Mohammed Abdul Latif Jameel ’78, launched the Abdul Latif Jameel