[Python] Statistical analysis of time series

阿新 • • 發佈：2017-12-22

win symbols values with nts pre pyplot lose val

Global Statistics:

Common seen methods as such

1. Mean

2. Median

3. Standard deviation: the larger the number means it various a lot.

4. Sum.

Rolling Statistics:

技術分享圖片

It use a time window, moving forward each day to calculate the mean value of those window periods.

To find which day is good to buy which day is good for sell, we can use Bollinger bands.

Bollinger bands:

技術分享圖片

import os
import pandas as pd
import matplotlib.pyplot as plt

def test_run():
    start_date=‘2017-01-01‘
    end_data=‘2017-12-15‘
    dates=pd.date_range(start_date, end_data)

    # Create an empty data frame
    df=pd.DataFrame(index=dates)

    symbols=[‘SPY 
‘, ‘AAPL‘, ‘IBM‘, ‘GOOG‘, ‘GLD‘]
    for symbol in symbols:
        temp=getAdjCloseForSymbol(symbol)
        df=df.join(temp, how=‘inner‘)

    return df  

if __name__ == ‘__main__‘:
    df=test_run()
    # data=data.ix[‘2017-12-01‘:‘2017-12-15‘, [‘IBM‘, ‘GOOG‘]]    
    # df=normalize_data(df) 

    ax = df[‘SPY‘].plot(title="SPY rolling mean", label=‘SPY‘)
    rm = df[‘SPY‘].rolling(20).mean()
    rm.plot(label=‘Rolling mean‘, ax=ax) 
    ax.set_xlabel(‘Date‘)
    ax.set_ylabel(‘Price‘)
    ax.legend(loc="upper left")
    plt.show()

Now we can calculate Bollinger bands, it is 2 times std value.

"""Bollinger Bands."""

import os
import pandas as pd
import matplotlib.pyplot as plt

def symbol_to_path(symbol, base_dir="data"):
    """Return CSV file path given ticker symbol."""
    return os.path.join(base_dir, "{}.csv".format(str(symbol)))


def get_data(symbols, dates):
    """Read stock data (adjusted close) for given symbols from CSV files."""
    df = pd.DataFrame(index=dates)
    if ‘SPY‘ not in symbols:  # add SPY for reference, if absent
        symbols.insert(0, ‘SPY‘)

    for symbol in symbols:
        df_temp = pd.read_csv(symbol_to_path(symbol), index_col=‘Date‘,
                parse_dates=True, usecols=[‘Date‘, ‘Adj Close‘], na_values=[‘nan‘])
        df_temp = df_temp.rename(columns={‘Adj Close‘: symbol})
        df = df.join(df_temp)
        if symbol == ‘SPY‘:  # drop dates SPY did not trade
            df = df.dropna(subset=["SPY"])

    return df


def plot_data(df, title="Stock prices"):
    """Plot stock prices with a custom title and meaningful axis labels."""
    ax = df.plot(title=title, fontsize=12)
    ax.set_xlabel("Date")
    ax.set_ylabel("Price")
    plt.show()


def get_rolling_mean(values, window):
    """Return rolling mean of given values, using specified window size."""
    return values.rolling(window=window).mean()


def get_rolling_std(values, window):
    """Return rolling standard deviation of given values, using specified window size."""
    # TODO: Compute and return rolling standard deviation
    return values.rolling(window=window).std()


def get_bollinger_bands(rm, rstd):
    """Return upper and lower Bollinger Bands."""
    # TODO: Compute upper_band and lower_band
    upper_band = rstd * 2 + rm
    lower_band =  rm - rstd * 2
    return upper_band, lower_band


def test_run():
    # Read data
    dates = pd.date_range(‘2012-01-01‘, ‘2012-12-31‘)
    symbols = [‘SPY‘]
    df = get_data(symbols, dates)

    # Compute Bollinger Bands
    # 1. Compute rolling mean
    rm_SPY = get_rolling_mean(df[‘SPY‘], window=20)

    # 2. Compute rolling standard deviation
    rstd_SPY = get_rolling_std(df[‘SPY‘], window=20)

    # 3. Compute upper and lower bands
    upper_band, lower_band = get_bollinger_bands(rm_SPY, rstd_SPY)
    
    # Plot raw SPY values, rolling mean and Bollinger Bands
    ax = df[‘SPY‘].plot(title="Bollinger Bands", label=‘SPY‘)
    rm_SPY.plot(label=‘Rolling mean‘, ax=ax)
    upper_band.plot(label=‘upper band‘, ax=ax)
    lower_band.plot(label=‘lower band‘, ax=ax)

    # Add axis labels and legend
    ax.set_xlabel("Date")
    ax.set_ylabel("Price")
    ax.legend(loc=‘upper left‘)
    plt.show()


if __name__ == "__main__":
    test_run()

技術分享圖片

[Python] Statistical analysis of time series

win symbols values with nts pre pyplot lose val Global Statistics: Common seen methods as such 1. Mean 2. Median 3. Standard deviatio

3.1.7. Cross validation of time series data

distrib per ted sklearn provided imp depend util ech 3.1.7. Cross validation of time series data Time series data is characterised by the

時間序列聚類演算法-《k-Shape: Efficient and Accurate Clustering of Time Series》解讀

摘要本文提出了一個新穎的時間序列聚類演算法k-shape，該演算法的核心是迭代增強過程，可以生成同質且較好分離的聚類。該演算法採用標準的互相關距離衡量方法，基於此距離衡量方法的特性，提出了一個計算簇心的方法，在每一次迭代中都用它來更新時間序列的聚類分配。作者通過大量和具有

python資料分析：時間序列分析（Time series analysis）

何為時間序列分析：時間序列經常通過折線圖繪製。時間序列用於統計，訊號處理，模式識別，計量經濟學，數學金融，天氣預報，地震預測，腦電圖，控制工程，天文學，通訊工程，以及主要涉及時間測量的任何應用科學和工程領域。時間序列分析包括用於分析時間序列資料的方法，以便提取有意義的統計資料

Time Series Forecast Study with Python: Monthly Sales of French Champagne

Tweet Share Share Google Plus Time series forecasting is a process, and the only way to get good

閱讀筆記：Time Series FeatuRe Extraction on basis of Scalable Hypothesis tests（Python package）

閱讀筆記：Time Series FeatuRe Extraction on basis of Scalable Hypothesis tests (tsfresh – A Python package) 摘要：時間序列特徵工程是一個耗時的過程，因為科學家

Python異常：ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.an

Python跑了一個策略，報了個異常：ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().這句換的大概意思是：“”陣列的真實值不明確“”

【原始碼】時間序列分析與預測工具箱（Time Series Analysis and Forecast，TSAF）

時間序列是一組隨時間變化而收集的定量型變數觀測值。比如：道瓊斯工業股價指數、線上銷售、庫存、客戶數量、利率、費用等歷史資料都屬於時間序列。預測時間序列變數對於企業準確掌控運營狀態非常有用。通常，獨立變數不能用來建立時間序列變數的迴歸模型。時間序列分析的特點：

Analysis of Stock Market Cycles with fbprophet package in Python

Introduction to fbprophetFbprophet is an open source released by Facebook in order to provide some useful guidance for producing forecast at scale. By defa

Time for accountability: Analysis of 2017 and goals for 2018

Believe or not, another year is coming to an end! And I like to take a look at all the things I achieved, the ones I didn’t, and what I plan for the new ye

How to Create an ARIMA Model for Time Series Forecasting in Python

Tweet Share Share Google Plus A popular and widely used statistical method for time series forec

A Multivariate Time Series Guide to Forecasting and Modeling (with Python codes)

But I'll give you a quick refresher of what a univariate time series is, before going into the details of a multivariate time series. Let's look at them on

Introduction to Time Series Forecasting With Python

I believe my books offer thousands of dollars of education for tens of dollars each. They are months if not years of experience distilled into a few hundre

Feature Selection for Time Series Forecasting with Python

Tweet Share Share Google Plus The use of machine learning methods on time series data requires f

How to Use Power Transforms for Time Series Forecast Data with Python

Tweet Share Share Google Plus Data transforms are intended to remove noise and improve the signa

How to Visualize Time Series Residual Forecast Errors with Python

Tweet Share Share Google Plus Forecast errors on time series regression problems are called resi

Time Series Data Visualization with Python

Tweet Share Share Google Plus 6 Ways to Plot Your Time Series Data with Python Time series lends

How to Model Residual Errors to Correct Time Series Forecasts with Python

Tweet Share Share Google Plus The residual errors from forecasts on a time series provide anothe

How to Load and Explore Time Series Data in Python

Tweet Share Share Google Plus The Pandas library in Python provides excellent, built-in support

Time Series Forecast Case Study with Python: Annual Water Usage in Baltimore

Tweet Share Share Google Plus Time series forecasting is a process, and the only way to get good

[Python] Statistical analysis of time series

相關推薦