keras深度學習預測比特幣走勢

阿新 • • 發佈：2018-12-21

一,專案目標

利用以往資料對比特幣走勢進行預測,使用卷積神經網路進行模型建設,後期通過修整優化模型效果,提升正確率

二,專案條件

專案要求使用的庫已經打包成requirements.txt檔案在附件

三,專案具體步驟

1.匯入資料

比特幣以往的資料儲存在bitcoin_historical_prices.csv檔案中,檔案裡面一共有7個變數:
 - 日期：觀察日期
 - ISO-周：給定年的週數
 - 開盤價:一定時間內的開市價
 - 最高價:一定時間內的最高價
 - 最低價:一定時間內的最低價
 - 收盤價:一定時間內的收盤價
 - 交易量:一定時間內的交易數量
 - 市場資本化值:這是由市場上限=價格X迴圈供應量計算的

2.資料整理

為了避免以往的資料影響我們只取得2016年以後的資料

以收盤價close和成交量作為主要資料

# 首先 我們只保留最近的資料
bitcoin_recent = bitcoin[bitcoin['date'] >= '2016-01-01']
# 讓我們只保留接近和體積變數。我們可以在另一個時間使用其他變數。
bitcoin_recent = bitcoin_recent[['date', 'iso_week', 'close', 'volume']]




# 對資料進行規範化
bitcoin_recent['close_point_relative_normalization'] = bitcoin_recent.groupby('iso_week')['close'].apply(
    lambda x: normalizations.point_relative_normalization(x))

*資料的標準化（normalization）是將資料按比例縮放，使之落入一個小的特定區間。在某些比較和評價的指標處理中經常會用到，去除資料的單位限制，將其轉化為無量綱的純數值，便於不同單位或量級的指標能夠進行比較和加權。

# 劃分訓練集和測試集
boundary = int(0.8 * bitcoin_recent['iso_week'].nunique())
train_set_weeks = bitcoin_recent['iso_week'].unique()[0:boundary]
test_set_weeks = bitcoin_recent[~bitcoin_recent['iso_week'].isin(train_set_weeks)]['iso_week'].unique()



# 現在建立單獨的資料集
train_dataset = bitcoin_recent[bitcoin_recent['iso_week'].isin(train_set_weeks)]
test_dataset = bitcoin_recent[bitcoin_recent['iso_week'].isin(test_set_weeks)]

3.建立模型

我們的資料集包含日常觀察和每個觀察影響未來的觀察。此外，我們有興趣預測未來一週（即七天）的比特幣價格。由於這些原因，我們選擇了引數週期週期長度和數字觀測值如下：

period_length：用作訓練輸入的週期的大小。我們的時期是在不同的星期組織的。我們將使用7天的時間來預測未來的一週。
number_of_observations：我們的資料集有多少個不同的週期？我們在資料集中有77周可用，鑑於我們將使用最後一週在每個時期測試LSTM網路，我們將使用77-1=76個週期對其進行訓練。

from keras.models import Sequential
from keras.layers.recurrent import LSTM
from keras.layers.core import Dense, Activation

period_length = 7
number_of_periods = 76


def build_model(period_length, number_of_periods, batch_size=1):
    """
    Builds an LSTM model using Keras. This function
    works as a simple wrapper for a manually created
    model.
    
    Parameters
    ----------
    period_length: int
        The size of each observation used as input.
    
    number_of_periods: int
        The number of periods available in the 
        dataset.
    
    batch_size: int
        The size of the batch used in each training
        period.
    
    Returns
    -------
    model: Keras model
        Compiled Keras model that can be trained
        and stored in disk.
    """
    model = Sequential()
    model.add(LSTM(
        units=period_length,
        batch_input_shape=(batch_size, number_of_periods, period_length),
        input_shape=(number_of_periods, period_length),
        return_sequences=False, stateful=False))
    # 常用層Dense層
    model.add(Dense(units=period_length))
    # Activation層
    model.add(Activation("linear"))

    # 編譯模型以供訓練
    model.compile(loss="mse", optimizer="rmsprop")

    return model



# 模型儲存
model = build_model(period_length=period_length, number_of_periods=number_of_periods)
model.save('bitcoin_lstm_v0.h5')

常用層引數:

引數:
units：大於0的整數，代表該層的輸出維度。

activation：啟用函式，為預定義的啟用函式名（參考啟用函式），或逐元素（element-wise）的Theano函式。如果不指定該引數，將不會使用任何啟用函式（即使用線性啟用函式：a(x)=x）

use_bias: 布林值，是否使用偏置項

kernel_initializer：權值初始化方法，為預定義初始化方法名的字串，或用於初始化權重的初始化器。參考initializers

bias_initializer：偏置向量初始化方法，為預定義初始化方法名的字串，或用於初始化偏置向量的初始化器。參考initializers

kernel_regularizer：施加在權重上的正則項，為Regularizer物件

bias_regularizer：施加在偏置向量上的正則項，為Regularizer物件

activity_regularizer：施加在輸出上的正則項，為Regularizer物件

kernel_constraints：施加在權重上的約束項，為Constraints物件

bias_constraints：施加在偏置上的約束項，為Constraints物件

Activation層引數

activation：將要使用的啟用函式，為預定義啟用函式名或一個Tensorflow/Theano的函式。參考啟用函式

編譯模型引數

optimizer：優化器，為預定義優化器名或優化器物件，參考優化器

loss：目標函式，為預定義損失函式名或一個目標函式，參考目標函式

metrics：列表，包含評估模型在訓練和測試時的效能的指標，典型用法是metrics=['accuracy']如果要在多輸出模型中為不同的輸出指定不同的指標，可像該引數傳遞一個字典，例如metrics={'ouput_a': 'accuracy'}

sample_weight_mode：如果你需要按時間步為樣本賦權（2D權矩陣），將該值設為“temporal”。預設為“None”，代表按樣本賦權（1D權）。如果模型有多個輸出，可以向該引數傳入指定sample_weight_mode的字典或列表。在下面fit函式的解釋中有相關的參考內容。

kwargs：使用TensorFlow作為後端請忽略該引數，若使用Theano作為後端，kwargs的值將會傳遞給 K.function

4.構建深度學習系統

1.讀取之前構建的訓練集和測試集

train = pd.read_csv('data/train_dataset.csv')

data = create_groups(train['close_point_relative_normalization'].values)

# 把資料分成7行1列的結構
X_train = data[:-1,:].reshape(1, 76, 7)
Y_validation = data[-1].reshape(1, 7)
2.匯入模型

model = load_model('bitcoin_lstm_v0.h5')
3.模型訓練

history = model.fit(
    x=X_train, y=Y_validation,
    batch_size=32, epochs=100)
4.做出預測

# 預測下次函式
def denormalize(series, last_value):
    result = last_value * (series + 1)
    return result

predictions = model.predict(x=X_train)[0]

last_weeks_value = train[train['date'] == train['date'].max()]['close'].values[0]
denormalized_prediction = denormalize(predictions, last_weeks_value)

5.畫出影象

pd.DataFrame(denormalized_prediction).plot(linewidth=2, figsize=(6, 4), color='#d35400', grid=True)

6.得出結論

由圖可知,下週會漲30%左右

5.重新評估模型

方法一: 採用MSE

# 載入訓練集和測試集
test = pd.read_csv('data/test_dataset.csv')
train = pd.read_csv('data/train_dataset.csv')

# 劃分資料
train_data = create_groups(
    train['close_point_relative_normalization'].values)
test_data = create_groups(
    test['close_point_relative_normalization'].values)
    
X_train, Y_train = split_lstm_input(train_data)


# 匯入模型
model = load_model('bitcoin_lstm_v0.h5')
model_history = train_model(model=model, X=X_train, Y=Y_train, epochs=100, version=0, run_number=0)

LSTM模型評價

讓我們來評估我們的模型是如何針對未知資料進行的。我們的模型在76周內被訓練來預測接下來的幾周，即7天的序列。當我們開始這個專案時，我們將原始資料集劃分為一個測試集和一個驗證集。現在我們將採用最初訓練的網路——包含76周——並用它從驗證集中預測所有19周。

為了做到這一點，我們需要一個76周的序列作為預測的資料。為了以連續的方式獲得資料，我們將訓練集和驗證集結合起來，然後從系列開始移動76個視窗直到結束-1。我們最後留下一個，因為這是我們可以做的最終目標預測。

在這些迭代中的每一個，我們的LSTM模型產生7天的預測。我們接受這些預測並把它們分開。然後，我們將預測的系列與驗證集中的所有周進行比較。我們這樣做是通過計算MSE和MAPE的最後一系列

# 把訓練集和驗證集結合起來
combined_set = np.concatenate((train_data, test_data), axis=1)


evaluated_weeks = []
for i in range(0, test_data.shape[1]):
    input_series = combined_set[0:,i:i+77]

    X_test = input_series[0:,:-1].reshape(1, input_series.shape[1] - 1, 7)
    Y_test = input_series[0:,-1:][0]
    
    result = model.evaluate(x=X_test, y=Y_test, verbose=0)
    evaluated_weeks.append(result)
    
ax = pd.Series(evaluated_weeks).plot(drawstyle="steps-post",
                                     figsize=(14,4),
                                     linewidth=2,
                                     color='#2c3e50',
                                     grid=True,
                                     title='Mean Squared Error (MSE) for Test Data')

y = [i for i in range(0, len(evaluated_weeks))]
yint = range(min(y), math.ceil(max(y))+1)
plt.xticks(yint)

ax.set_xlabel("Predicted Week")
ax.set_ylabel("MSE")

得到影象:

模型結果解釋 MSE對於我們的問題來說是一個很好的損失函式，但其結果很難解釋。我們使用兩個實用函式來幫助解釋我們的結果：根均方誤差（rmse())和平均絕對誤差（mape())。我們將執行這些功能的觀察和預測系列。

方法二:predict()方法

# 構建一個新的檔案
predicted_weeks = []
for i in range(0, test_data.shape[1]):
    input_series = combined_set[0:,i:i+76]
    predicted_weeks.append(model.predict(input_series))
    
predicted_days = []
for week in predicted_weeks:
    predicted_days += list(week[0])

combined = pd.concat([train, test])

last_day = datetime.strptime(train['date'].max(), '%Y-%m-%d')
list_of_days = []
for days in range(1, len(predicted_days) + 1):
    D = (last_day + timedelta(days=days)).strftime('%Y-%m-%d')
    list_of_days.append(D)
predicted = pd.DataFrame({
    'date': list_of_days, 
    'close_point_relative_normalization': predicted_days 
})

combined['date'] = combined['date'].apply(
                    lambda x: datetime.strptime(x, '%Y-%m-%d'))


predicted['date'] = predicted['date'].apply(lambda x: datetime.strptime(x, '%Y-%m-%d'))

observed = combined[combined['date'] > train['date'].max()]

將預測值與實際值比較

def plot_two_series(A, B, variable, title):
    """
    Plots two series using the same `date` index. 
    
    Parameters
    ----------
    A, B: pd.DataFrame
        Dataframe with a `date` key and a variable
        passed in the `variable` parameter. Parameter A
        represents the "Observed" series and B the "Predicted"
        series. These will be labelled respectivelly. 
    
    variable: str
        Variable to use in plot.
    
    title: str
        Plot title.
    
    """
    plt.figure(figsize=(14,4))
    plt.xlabel('Real and predicted')

    ax1 = A.set_index('date')[variable].plot(
        linewidth=2, color='#d35400', grid=True, label='Observed', title=title)

    ax2 = B.set_index('date')[variable].plot(
        linewidth=2, color='grey', grid=True, label='Predicted')
    
    ax1.set_xlabel("Predicted Week")
    ax1.set_ylabel("Predicted Values")

    h1, l1 = ax1.get_legend_handles_labels()
    h2, l2 = ax2.get_legend_handles_labels()

    plt.legend(l1+l2, loc=2)
    plt.show()
plot_two_series(observed, predicted, 
                variable='close_point_relative_normalization',
                title='Normalized Predictions per Week')

得到影象:

keras深度學習預測比特幣走勢

keras深度學習預測比特幣走勢

利用Google趨勢來預測比特幣價格

BTC大額轉賬數量減少市場交易熱情明顯降低，李啟元預測比特幣未來將升至2萬美元

ai預測比特幣未來5年行情

學習構建比特幣區塊鏈應用最好的資源

“GAN之父”：當初為了深度學習買GPU，現在後悔沒多挖點比特幣

比特幣以及區塊鏈原理學習

精通比特幣學習（一）

從零開始學習比特幣開發（七）-P2P網路建立流程之生成地址對並連線到指定地址

從零開始學習比特幣（六）--P2P網路建立的流程之查詢DNS節點

從零開始學習比特幣（五）--P2P網路建立的流程之套接字的讀取和傳送

從零開始學習比特幣開發（四）--網路初始化，載入區塊鏈和錢包，匯入區塊啟動節點

Learning Bitcoin and Other Cryptocurrencies 學習比特幣和其他加密貨幣 Lynda課程中文字幕

區塊鏈學習1.5-比特幣原始碼的學習-比特幣網路

區塊鏈學習1.4-比特幣原始碼的學習-比特幣基礎

深入淺出區塊鏈-第2講：比特幣的轉賬機制和學習區塊鏈需要掌握的7個基本名詞

【區塊鏈】比特幣原始碼學習

Go語言學習（三）簡單比特幣挖礦類實現

從零開始學習比特幣--P2P 網路的建立之訊息處理上篇

從零開始學習比特幣開發（九）--P2P 網路建立之訊息處理中篇

keras深度學習預測比特幣走勢

相關推薦