1. 程式人生 > >Assignment | 05-week1 -Improvise a Jazz Solo with an LSTM Network

Assignment | 05-week1 -Improvise a Jazz Solo with an LSTM Network

該系列僅在原課程基礎上課後作業部分添加個人學習筆記,如有錯誤,還請批評指教。- ZJ

Welcome to your final programming assignment of this week! In this notebook, you will implement a model that uses an LSTM to generate music. You will even be able to listen to your own music at the end of the assignment.

歡迎來到本週的最終程式設計任務! 在這次筆記中,您將實現一個使用 LSTM 生成音樂的模型。 你甚至可以在作業結束時聽取自己的音樂。

You will learn to:
- Apply an LSTM to music generation.
- Generate your own jazz music with deep learning.

  • 將LSTM應用於音樂生成。
  • 深度學習生成自己的爵士音樂。

Please run the following cell to load all the packages required in this assignment. This may take a few minutes.

from __future__ import print_function
import
IPython import sys from music21 import * import numpy as np from grammar import * from qa import * from preprocess import * from music_utils import * from data_utils import * from keras.models import load_model, Model from keras.layers import Dense, Activation, Dropout, Input, LSTM, Reshape, Lambda, RepeatVector from
keras.initializers import glorot_uniform from keras.utils import to_categorical from keras.optimizers import Adam from keras import backend as K
d:\program files\python36\lib\site-packages\h5py\__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Using TensorFlow backend.

1 - Problem statement

You would like to create a jazz music piece specially for a friend’s birthday. However, you don’t know any instruments or music composition. Fortunately, you know deep learning and will solve this problem using an LSTM netwok.

You will train a network to generate novel jazz solos in a style representative of a body of performed work.

您想為朋友的生日專門製作爵士樂曲。 但是,你不知道任何樂器或音樂作品。 幸運的是,你知道深度學習,並將使用 LSTM 網路來解決這個問題。

您將訓練一個網路,以演奏作品的代表作風格代表爵士樂獨奏。

這裡寫圖片描述

1.1 - Dataset

You will train your algorithm on a corpus of Jazz music. Run the cell below to listen to a snippet of the audio from the training set:

IPython.display.Audio('./data/30s_seq.mp3')

音樂片段,無法上傳。

We have taken care of the preprocessing of the musical data to render it in terms of musical “values.” You can informally think of each “value” as a note, which comprises a pitch and a duration. For example, if you press down a specific piano key for 0.5 seconds, then you have just played a note. In music theory, a “value” is actually more complicated than this–specifically, it also captures the information needed to play multiple notes at the same time. For example, when playing a music piece, you might press down two piano keys at the same time (playng multiple notes at the same time generates what’s called a “chord”和絃). But we don’t need to worry about the details of music theory for this assignment. For the purpose of this assignment, all you need to know is that we will obtain a dataset of values, and will learn an RNN model to generate sequences of values.

我們已經關注音樂資料的預處理,以音樂的“value”來表達它。你可以非正式地將每個“value”看作一個音符,它包含一個音高和一個持續時間。例如,如果您按下特定鋼琴鍵0.5秒,那麼您剛剛彈奏了一個音符。在音樂理論中,“value”實際上比這更復雜 - 具體來說,它還捕獲了同時播放多個音符所需的資訊。例如,在播放音樂作品時,可以同時按下兩個鋼琴鍵(同時播放多個音符生成所謂的“和絃”)。但是我們不需要擔心這個任務的音樂理論的細節。為了這個任務的目的,你需要知道的是,我們將獲得一個值的資料集,並將學習一個 RNN 模型來生成序列值。

Our music generation system will use 78 unique values. Run the following code to load the raw music data and preprocess it into values. This might take a few minutes.

我們的音樂生成系統將使用78個獨特的值。執行以下程式碼以載入原始音樂資料並將其預處理為值。這可能需要幾分鐘的時間。

X, Y, n_values, indices_values = load_music_utils()
print('shape of X:', X.shape)
print('number of training examples:', X.shape[0])
print('Tx (length of sequence):', X.shape[1])
print('total # of unique values:', n_values)
print('Shape of Y:', Y.shape)
# 共 60 個訓練樣本,每個訓練樣本的 序列長度是 30 ,音符和絃相關的彙集表 共 78 
shape of X: (60, 30, 78)
number of training examples: 60
Tx (length of sequence): 30
total # of unique values: 78
Shape of Y: (30, 60, 78)

You have just loaded the following:

  • X: This is an (m, Tx, 78) dimensional array. We have m training examples, each of which is a snippet of Tx=30 musical values. At each time step, the input is one of 78 different possible values, represented as a one-hot vector. Thus for example, X[i,t,:] is a one-hot vector representating the value of the i-th example at time t.

  • X:這是一個(m,Tx,78)維陣列。 我們有 m 個訓練樣例,每個樣例都是Tx=30音樂值的片段。 在每個時間步,輸入是78個不同的可能值之一,表示為一個one-hot vector。 因此,例如, X[i,t,:]是表示第 i 個示例在時間 t 的值的 one-hot vector。

  • Y: This is essentially the same as X, but shifted one step to the left (to the past). Similar to the dinosaurus assignment, we’re interested in the network using the previous values to predict the next value, so our sequence model will try to predict yt given x1,,xt. However, the data in Y is reordered to be dimension (Ty,m,78), where Ty=Tx. This format makes it more convenient to feed to the LSTM later.

  • Y:這與X基本相同,但向左移一步(到過去)。 與恐龍分配類似,我們對使用先前值預測下一個值的網路感興趣,因此我們的序列模型將嘗試預測yt 給出 x1,,xt。 然而,Y中的資料被重新排序為(Ty,m,78),其中 Ty=Tx。 這種格式使得稍後進入 LSTM 更方便。

  • n_values: The number of unique values in this dataset. This should be 78.

  • indices_values: python dictionary mapping from 0-77 to musical values.

1.2 - Overview of our model

Here is the architecture of the model we will use. This is similar to the Dinosaurus model you had used in the previous notebook, except that in you will be implementing it in Keras. The architecture is as follows:

這是我們將使用的模型的架構。 這與您在前一個筆記中使用的 Dinosaurus 模型類似,只不過您將在 Keras 中實現它。 架構如下:

這裡寫圖片描述

2 - Building the model

In this part you will build and train a model that will learn musical patterns. To do so, you will need to build a model that takes in X of shape (m,Tx,78) and Y of shape (Ty,m,78). We will use an LSTM with 64 dimensional hidden states. Lets set n_a = 64.

在這部分你將建立和訓練一個將學習音樂模式的模型。 為此,您需要構建一個模型,該模型需要形狀為(m,Tx,78) 和形狀為 (Ty,m,78) 的Y。 我們將使用 64 維隱藏狀態的LSTM。 讓我們設定n_a = 64

n_a = 64 

Here’s how you can create a Keras model with multiple inputs and outputs. If you’re building an RNN where even at test time entire input sequence x1,x2,,xTx were given in advance, for example if the inputs were words and the output was a label, then Keras has simple built-in functions to build the model. However, for sequence generation, at test time we don’t know all the values of xt in advance; instead we generate them one at a time using xt=yt1. So the code will be a bit more complicated, and you’ll need to implement your own for-loop to iterate over the different time steps.

以下是如何建立具有多個輸入和輸出的 Keras 模型。 如果你正在建立一個 RNN,即使在測試時間,整個輸入序列x1,x2,,xTx 預先給定,例如,如果輸入是單詞並且輸出是標籤,則Keras具有簡單的內建函式來構建模型。 但是,對於序列生成,在測試時我們並不知道xt 的所有值, 相反,我們使用