How to Predict Whether a Persons Eyes are Open or Closed Using Brain Waves
A Case Study in How to Avoid Methodological Errors when
Evaluating Machine Learning Methods for Time Series Forecasting.
Evaluating machine learning models on time series forecasting problems is challenging.
It is easy to make a small error in the framing of a problem or in the evaluation of models that give impressive results but result in an invalid finding.
An interesting time series classification problem is predicting whether a subject’s eyes are open or closed based only on their brain wave data (EEG).
In this tutorial, you will discover the problem of predicting whether eyes are open or closed based on brain waves and a common methodological trap when evaluating time series forecasting models.
Working through this tutorial, you will have an idea of how to avoid common traps when evaluating machine learning algorithms on time series forecast problems. These are traps that catch both beginners, expert practitioners, and academics alike.
After completing this tutorial, you will know:
- The eye-state prediction problem and a standard machine learning dataset that you can use.
- How to reproduce skilful results for predicting eye-state from brainwaves in Python.
- How to uncover an interesting methodological flaw in evaluating forecast models.
Let’s get started.
Tutorial Overview
This tutorial is divided into seven parts; they are:
- Predict Open/Closed Eyes from Brain Waves
- Data Visualization and Remove Outliers
- Develop the Predictive Model
- Problem with the Model Evaluation Methodology
- Train-Test Split with Temporal Ordering
- Walk-Forward Validation
- Takeaways and Key Lesson
Predict Open/Closed Eyes from Brain Waves
In this post, we are going to take a closer look at a problem that involves predicting whether the subjects eyes are open or closed based on brain wave data.
The problem was described and data collected by Oliver Rosler and David Suendermann for their 2013 paper titled “A First Step towards Eye State Prediction Using EEG“.
I saw this dataset and I had to know more.
Specifically, an electroencephalography (EEG) recording was made of a single person for 117 seconds (just under two minutes) while the subject opened and closed their eyes, which was recorded via a video camera. The open/closed state was then recorded against each time step in the EEG trace manually.
The EEG was recorded using a Emotiv EEG Neuroheadset, resulting in 14 traces.
The output variable is binary, meaning that this is a two-class classification problem.
A total of 14,980 observations (rows) were made over the 117 seconds, meaning that there were about 128 observations per second.
The corpus consists of 14,977 instances with 15 attributes each (14 attributes representing the values of the electrodes and the eye state). The instances are stored in the corpus in chronological order to be able to analyze temporal dependencies. 8,255 (55.12%) instances of the corpus correspond to the eye open and 6,722 (44.88%) instances to the eye closed state.
There were also some EEG observations that have a much larger than expected amplitude. These are likely outliers and can be identified and removed using a simple statistical method such as removing rows that have an observation 3-to-4 standard deviations from the mean.
The simplest framing of the problem is to predict the eye-state (open/closed) given the EEG trace at the current time step. More advanced framings of the problem may seek to model the multivariate time series of each EEG trace in order to predict the current eye state.
Need help with Deep Learning for Time Series?
Take my free 7-day email crash course now (with sample code).
Click to sign-up and also get a free PDF Ebook version of the course.
Data Visualization and Remove Outliers
The dataset can be downloaded for free from the UCI Machine Learning repository:
The raw data is in ARFF format (used in Weka), but can be converted to CSV by deleting the ARFF header.
Below is a sample of the first five lines of the data with the ARFF header removed.
123456 | 4329.23,4009.23,4289.23,4148.21,4350.26,4586.15,4096.92,4641.03,4222.05,4238.46,4211.28,4280.51,4635.9,4393.85,04324.62,4004.62,4293.85,4148.72,4342.05,4586.67,4097.44,4638.97,4210.77,4226.67,4207.69,4279.49,4632.82,4384.1,04327.69,4006.67,4295.38,4156.41,4336.92,4583.59,4096.92,4630.26,4207.69,4222.05,4206.67,4282.05,4628.72,4389.23,04328.72,4011.79,4296.41,4155.9,4343.59,4582.56,4097.44,4630.77,4217.44,4235.38,4210.77,4287.69,4632.31,4396.41,04326.15,4011.79,4292.31,4151.28,4347.69,4586.67,4095.9,4627.69,4210.77,4244.1,4212.82,4288.21,4632.82,4398.46,0... |
We can load the data as a DataFrame and plot the time series for each EEG trace and the output variable (open/closed state).
The complete code example is listed below.
The example assumes that you have a copy of the dataset in CSV format with the filename ‘EEG_Eye_State.csv‘ in the same directory as the code.
12345678910111213 | # visualize datasetfrom pandas import read_csvfrom matplotlib import pyplot# load the datasetdata=read_csv('EEG_Eye_State.csv',header=None)# retrieve data as numpy arrayvalues=data.values# create a subplot for each time seriespyplot.figure()foriinrange(values.shape[1]):pyplot.subplot(values.shape[1],1,i+1)pyplot.plot(values[:,i])pyplot.show() |
Running the example creates a line plot for each EEG trace and the output variable.
We can see the outliers washing out the data in each trace. We can also see the open/closed state of the eyes over time with 0/1 respectively.
It is useful to remove the outliers to better understand the relationship between the EEG traces and the open/closed state of the eyes.
The example below removes all rows that have an EEG observation that is four standard deviations or more from the mean. The dataset is saved to a new file called ‘EEG_Eye_State_no_outliers.csv‘.
It is a quick and dirty implementation of outlier detection and removal, but gets the job done. I’m sure you could engineer a more efficient implementation.
1234567891011121314151617181920212223242526 | # remove outliers from the EEG datafrom pandas import read_csvfrom numpy import meanfrom numpy import stdfrom numpy import deletefrom numpy import savetxt# load the dataset.data=read_csv('EEG_Eye_State.csv',header=None)values=data.values# step over each EEG columnforiinrange(values.shape[1]-1):# calculate column mean and standard deviationdata_mean,data_std=mean(values[:,i]),std(values[:,i])# define outlier boundscut_off=data_std *4lower,upper=data_mean-cut_off,data_mean+cut_off# remove too smalltoo_small=[jforjinrange(values.shape[0])ifvalues[j,i]<lower]values=delete(values,too_small,0)print('>deleted %d rows'%len(too_small))# remove too largetoo_large=[jforjinrange(values.shape[0])ifvalues[j,i]>upper]values=delete(values,too_large,0)print('>deleted %d rows'%len(too_large))# save the results to a new filesavetxt('EEG_Eye_State_no_outliers.csv',values,delimiter=',') |
Running the example summarizes the rows deleted as each column in the EEG data is processed for outliers above and below the mean.
12345678910111213141516171819202122232425262728 | >deleted 0 rows>deleted 1 rows>deleted 2 rows>deleted 1 rows>deleted 0 rows>deleted 142 rows>deleted 0 rows>deleted 48 rows>deleted 0 rows>deleted 153 rows>deleted 0 rows>deleted 43 rows>deleted 0 rows>deleted 0 rows>deleted 0 rows>deleted 15 rows>deleted 0 rows>deleted 5 rows>deleted 10 rows>deleted 0 rows>deleted 21 rows>deleted 53 rows>deleted 0 rows>deleted 12 rows>deleted 58 rows>deleted 53 rows>deleted 0 rows>deleted 59 rows |
We can now visualize the data without outliers by loading the new ‘EEG_Eye_State_no_outliers.csv‘ file.
12345678910111213 | # visualize dataset without outliersfrom pandas import read_csvfrom matplotlib import pyplot# load the datasetdata=read_csv('EEG_Eye_State_no_outliers.csv',header=None)# retrieve data as numpy arrayvalues=data.values# create a subplot for each time seriespyplot.figure()foriinrange(values.shape[1]):pyplot.subplot(values.shape[1],1,i+1)pyplot.plot(values[:,i])pyplot.show() |
Running the example creates a better plot, clearly showing little positive peaks when eyes are closed (1) and negative peaks when eyes are open (0).
Develop the Predictive Model
The simplest predictive model is to predict the eye open/closed state based on the current EEG observation, ignoring the trace information.
Intuitively, one would not expect this to be effective, nevertheless, it was the approach used in Rosler and Suendermann’s 2013 paper.
Specifically, they evaluated a large suite of classification algorithms in the Weka software using 10-fold cross-validation of this framing of the problem. They achieved better than 90% accuracy with multiple methods, including instance based methods such as k-nearest neighbors and KStar.
However, instance-based learners such as IB1 and KStar outperformed decision trees yet again substantially. The latter achieved the clearly best performance with a classification error rate of merely 3.2%.
A similar methodology and finding was used with the same and similar datasets in a number of other papers.
I was surprised when I read this and so reproduced the result.
The complete example is listed below with a k=3 KNN.
1234567891011121314151617181920212223242526272829 | # knn for predicting eye statefrom pandas import read_csvfrom sklearn.metrics import accuracy_scorefrom sklearn.model_selection import KFoldfrom sklearn.neighbors import KNeighborsClassifierfrom numpy import mean# load the datasetdata=read_csv('EEG_Eye_State_no_outliers.csv',header=None)values=data.values# evaluate knn using 10-fold cross-validationscores=list()kfold=KFold(10,shuffle=True,random_state=1)fortrain_ix,test_ix inkfold.split(values):# define train/test X/ytrainX,trainy=values[train_ix,:-1],values[train_ix,-1]testX,testy=values[test_ix,:-1],values[test_ix,-1]# define modelmodel=KNeighborsClassifier(n_neighbors=3)# fit model on train setmodel.fit(trainX,trainy)# forecast test setyhat=model.predict(testX)# evaluate predictionsscore=accuracy_score(testy,yhat)# storescores.append(score)print('>%.3f'%score)# calculate mean score across each runprint('Final Score: %.3f'%(mean(scores))) |
Running the example prints the score for each fold of the cross validation and the mean score of 97% averaged across all 10 folds.
1234567891011 | >0.970>0.975>0.978>0.977>0.973>0.979>0.978>0.976>0.974>0.969Final Score: 0.975 |
Very impressive!
But something felt wrong.
I was interested to see how models that took into account the clear peaks in the data at each transition from open-to-closed and closed-to-open performed.
Every model I tried using my own test harness that respected the temporal ordering of the data performed much worse.
Why?
Hint: think about the chosen model evaluation strategy and the type of algorithm that performed the best.
Problem with the Model Evaluation Methodology
Disclaimer: I am not calling out the authors of the paper or related papers. I don’t