轉Python & Numpy 教程(下)
Numpy
Numpy是Python科學計算的核心庫。它提供了高效能多維陣列物件,以及使用這些陣列的工具。如果你已經熟悉MATLAB,你可以找到這個教程來開始使用Numpy。
Arrays
一個numpy的陣列(array)是一個由相同型別數值構成的網路(grid),並且被非負整數的元組索引。維數是陣列的rank;而陣列的shape是一個整數元組,它給出了陣列每一維度的大小。
我們可以使用巢狀的Python lists初始化numpy陣列,並使用方括號來訪問元素。
import numpy as np a = np.array([1, 2, 3]) # Create a rank 1 array print type(a) # Prints "<type 'numpy.ndarray'>" print a.shape # Prints "(3,)" print a[0], a[1], a[2] # Prints "1 2 3" a[0] = 5 # Change an element of the array print a # Prints "[5, 2, 3]" b = np.array([[1,2,3],[4,5,6]]) # Create a rank 2 array print b.shape # Prints "(2, 3)" print b[0, 0], b[0, 1], b[1, 0] # Prints "1 2 4"
Numpy也提供了很多函式來建立陣列。
import numpy as np a = np.zeros((2,2)) # Create an array of all zeros print a # Prints "[[ 0. 0.] # [ 0. 0.]]" b = np.ones((1,2)) # Create an array of all ones print b # Prints "[[ 1. 1.]]" c = np.full((2,2), 7) # Create a constant array print c # Prints "[[ 7. 7.] # [ 7. 7.]]" d = np.eye(2) # Create a 2x2 identity matrix print d # Prints "[[ 1. 0.] # [ 0. 1.]]" e = np.random.random((2,2)) # Create an array filled with random values print e # Might print "[[ 0.91940167 0.08143941] # [ 0.68744134 0.87236687]]"
你可以在官方文件中找到更多的陣列建立的方法。
Array indexing
Numpy提供了一些方法來索引陣列。
Slicing:與Python lists類似,numpy 陣列可以被切分。陣列可能是多維的,你必須為每一維度確定一個切分。
import numpy as np # Create the following rank 2 array with shape (3, 4) # [[ 1 2 3 4] # [ 5 6 7 8] # [ 9 10 11 12]] a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]]) # Use slicing to pull out the subarray consisting of the first 2 rows # and columns 1 and 2; b is the following array of shape (2, 2): # [[2 3] # [6 7]] b = a[:2, 1:3] # A slice of an array is a view into the same data, so modifying it # will modify the original array. print a[0, 1] # Prints "2" b[0, 0] = 77 # b[0, 0] is the same piece of data as a[0, 1] print a[0, 1] # Prints "77"
你可以混合使用整數索引和切分索引。然而,這樣做會產生一個低階的陣列。這與MATLAB中運算元組切分的方式是完全不同的。
import numpy as np
# Create the following rank 2 array with shape (3, 4)
# [[ 1 2 3 4]
# [ 5 6 7 8]
# [ 9 10 11 12]]
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
# Two ways of accessing the data in the middle row of the array.
# Mixing integer indexing with slices yields an array of lower rank,
# while using only slices yields an array of the same rank as the
# original array:
row_r1 = a[1, :] # Rank 1 view of the second row of a
row_r2 = a[1:2, :] # Rank 2 view of the second row of a
print row_r1, row_r1.shape # Prints "[5 6 7 8] (4,)"
print row_r2, row_r2.shape # Prints "[[5 6 7 8]] (1, 4)"
# We can make the same distinction when accessing columns of an array:
col_r1 = a[:, 1]
col_r2 = a[:, 1:2]
print col_r1, col_r1.shape # Prints "[ 2 6 10] (3,)"
print col_r2, col_r2.shape # Prints "[[ 2]
# [ 6]
# [10]] (3, 1)"
Integerarray indexing: 當使用slicing索引到numpy 陣列內部的時候,所得的陣列是原始陣列的子陣列。相反,整數形式的陣列索引允許你使用其他陣列的資料來建立任意陣列。示例如下:
import numpy as np
a = np.array([[1,2], [3, 4], [5, 6]])
# An example of integer array indexing.
# The returned array will have shape (3,) and
print a[[0, 1, 2], [0, 1, 0]] # Prints "[1 4 5]"
# The above example of integer array indexing is equivalent to this:
print np.array([a[0, 0], a[1, 1], a[2, 0]]) # Prints "[1 4 5]"
# When using integer array indexing, you can reuse the same
# element from the source array:
print a[[0, 0], [1, 1]] # Prints "[2 2]"
# Equivalent to the previous integer array indexing example
print np.array([a[0, 1], a[0, 1]]) # Prints "[2 2]"
關於整數索引的一個有用的技巧是選擇或者改變來自矩陣每行的一個元素。
import numpy as np
# Create a new array from which we will select elements
a = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
print a # prints "array([[ 1, 2, 3],
# [ 4, 5, 6],
# [ 7, 8, 9],
# [10, 11, 12]])"
# Create an array of indices
b = np.array([0, 2, 0, 1])
# Select one element from each row of a using the indices in b
print a[np.arange(4), b] # Prints "[ 1 6 7 11]"
# Mutate one element from each row of a using the indices in b
a[np.arange(4), b] += 10
print a # prints "array([[11, 2, 3],
# [ 4, 5, 16],
# [17, 8, 9],
# [10, 21, 12]])
Booleanarray indexing: 布林型陣列索引允許你挑選出陣列的任意元素。這種索引方式用於選擇陣列中滿足特定條件的元素。示例如下:
import numpy as np
a = np.array([[1,2], [3, 4], [5, 6]])
bool_idx = (a > 2) # Find the elements of a that are bigger than 2;
# this returns a numpy array of Booleans of the same
# shape as a, where each slot of bool_idx tells
# whether that element of a is > 2.
print bool_idx # Prints "[[False False]
# [ True True]
# [ True True]]"
# We use boolean array indexing to construct a rank 1 array
# consisting of the elements of a corresponding to the True values
# of bool_idx
print a[bool_idx] # Prints "[3 4 5 6]"
# We can do all of the above in a single concise statement:
print a[a > 2] # Prints "[3 4 5 6]"
為了簡潔,關於numpy陣列索引的很多細節沒有介紹;如果想了解更多,可以閱讀官方文件。
Datatypes
每一個numpy陣列是一個由相同型別元素組成的網路。Numpy提供大量的數值型別,你可以用它們來構建陣列。在你建立陣列時候,Numpy嘗試猜測資料型別,但是建立陣列的函式通常會包含一個可選的引數來確定資料型別。這裡有一個示例:
import numpy as np
x = np.array([1, 2]) # Let numpy choose the datatype
print x.dtype # Prints "int64"
x = np.array([1.0, 2.0]) # Let numpy choose the datatype
print x.dtype # Prints "float64"
x = np.array([1, 2], dtype=np.int64) # Force a particular datatype
print x.dtype # Prints "int64"
你也可以在官方文件中閱讀所有的numpy資料型別。
Array math
基本的數學函式在陣列上進行元素級操作,在運算子過載和numpy模組的函式中都是可用的:
import numpy as np
x = np.array([[1,2],[3,4]], dtype=np.float64)
y = np.array([[5,6],[7,8]], dtype=np.float64)
# Elementwise sum; both produce the array
# [[ 6.0 8.0]
# [10.0 12.0]]
print x + y
print np.add(x, y)
# Elementwise difference; both produce the array
# [[-4.0 -4.0]
# [-4.0 -4.0]]
print x - y
print np.subtract(x, y)
# Elementwise product; both produce the array
# [[ 5.0 12.0]
# [21.0 32.0]]
print x * y
print np.multiply(x, y)
# Elementwise division; both produce the array
# [[ 0.2 0.33333333]
# [ 0.42857143 0.5 ]]
print x / y
print np.divide(x, y)
# Elementwise square root; produces the array
# [[ 1. 1.41421356]
# [ 1.73205081 2. ]]
print np.sqrt(x)
注意,與MATLAB中不同,“*”表示元素乘法,不是矩陣乘法。我們使用dot函式來計算向量內積,向量與矩陣相乘,已經矩陣乘法。dot 既可以作為numpy模組中的一個函式,也可以作為陣列物件的一個示例方法:
import numpy as np
x = np.array([[1,2],[3,4]])
y = np.array([[5,6],[7,8]])
v = np.array([9,10])
w = np.array([11, 12])
# Inner product of vectors; both produce 219
print v.dot(w)
print np.dot(v, w)
# Matrix / vector product; both produce the rank 1 array [29 67]
print x.dot(v)
print np.dot(x, v)
# Matrix / matrix product; both produce the rank 2 array
# [[19 22]
# [43 50]]
print x.dot(y)
print np.dot(x, y)
Numpy提供了很多有用的函式來在陣列上做運算;最有用的一個是sum:
import numpy as np
x = np.array([[1,2],[3,4]])
print np.sum(x) # Compute sum of all elements; prints "10"
print np.sum(x, axis=0) # Compute sum of each column; prints "[4 6]"
print np.sum(x, axis=1) # Compute sum of each row; prints "[3 7]"
你可以在官方文件中找到numpy提供的數學函式的列表。
除了在數學函式中使用陣列,我們經常還會需要對陣列中的資料進行操作或者重構。這種操作最簡單的例子就是轉置一個矩陣;要轉置一個矩陣,可以使用陣列物件的T屬性。
import numpy as np
x = np.array([[1,2], [3,4]])
print x # Prints "[[1 2]
# [3 4]]"
print x.T # Prints "[[1 3]
# [2 4]]"
# Note that taking the transpose of a rank 1 array does nothing:
v = np.array([1,2,3])
print v # Prints "[1 2 3]"
print v.T # Prints "[1 2 3]"
Numpy提供了非常多的運算元組的函式;你可以在官方文件中找到它們。
Broadcasting
廣播是一個強大的機制,它允許numpy在進行算數操作的時候可以使用不同的形狀的陣列。經常會有這種情況,我們有一個小陣列和一個大陣列,我們相用小陣列多次在大陣列上進行操作。
例如,設想我們要往矩陣的每一行上加一個常數向量。我們可以這樣做:
import numpy as np
# We will add the vector v to each row of the matrix x,
# storing the result in the matrix y
x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
v = np.array([1, 0, 1])
y = np.empty_like(x) # Create an empty matrix with the same shape as x
# Add the vector v to each row of the matrix x with an explicit loop
for i in range(4):
y[i, :] = x[i, :] + v
# Now y is the following
# [[ 2 2 4]
# [ 5 5 7]
# [ 8 8 10]
# [11 11 13]]
print y
這是可以正常工作的;但是當矩陣x非常大的時候,計算一個顯式迴圈在Python中是非常慢的。須知,將一個向量v加到矩陣x的每一行上,等效於通過垂直儲存v的多個副本來構建一個矩陣vv,然後對x和vv執行元素級求和。我們可以這樣執行這個方法:
import numpy as np
# We will add the vector v to each row of the matrix x,
# storing the result in the matrix y
x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
v = np.array([1, 0, 1])
vv = np.tile(v, (4, 1)) # Stack 4 copies of v on top of each other
print vv # Prints "[[1 0 1]
# [1 0 1]
# [1 0 1]
# [1 0 1]]"
y = x + vv # Add x and vv elementwise
print y # Prints "[[ 2 2 4
# [ 5 5 7]
# [ 8 8 10]
# [11 11 13]]"
即使x的shape是(4,3),而v的shape是(3,),y = x + v 仍然可以工作,這多虧了廣播;這一行工作時好像v的shape是(4,3),每一行是v的一個副本,求和是以元素級別執行的。
一起廣播兩個陣列遵循這些規則:
(1) 如果陣列沒有相同的階數,預先考慮低階陣列,直到兩個shapes有相同的長度。
(2) 如果他們在一個維度上具有相同的尺寸,或者其中一個在該維度上的size是1,就說它們是相容的。
(3)如果它們在所有的維度上是相容的,就可以一起廣播。
(4) 廣播後,每一個數組的shape表現得像是與兩個陣列的shape中最大那個元素相同。
(5) 在任何一個維度,如果一個數組的size是1,而另一個大於1,則第一個陣列表現得像是在該維度上進行了拷貝。
如果這個解釋不起作用,可以閱讀官方文件或者這裡的解釋。
支援廣播的函式成為通用函式。你可以在官方文件中找到通用函式的列表。
這裡是關於廣播的一些應用:
import numpy as np
# Compute outer product of vectors
v = np.array([1,2,3]) # v has shape (3,)
w = np.array([4,5]) # w has shape (2,)
# To compute an outer product, we first reshape v to be a column
# vector of shape (3, 1); we can then broadcast it against w to yield
# an output of shape (3, 2), which is the outer product of v and w:
# [[ 4 5]
# [ 8 10]
# [12 15]]
print np.reshape(v, (3, 1)) * w
# Add a vector to each row of a matrix
x = np.array([[1,2,3], [4,5,6]])
# x has shape (2, 3) and v has shape (3,) so they broadcast to (2, 3),
# giving the following matrix:
# [[2 4 6]
# [5 7 9]]
print x + v
# Add a vector to each column of a matrix
# x has shape (2, 3) and w has shape (2,).
# If we transpose x then it has shape (3, 2) and can be broadcast
# against w to yield a result of shape (3, 2); transposing this result
# yields the final result of shape (2, 3) which is the matrix x with
# the vector w added to each column. Gives the following matrix:
# [[ 5 6 7]
# [ 9 10 11]]
print (x.T + w).T
# Another solution is to reshape w to be a row vector of shape (2, 1);
# we can then broadcast it directly against x to produce the same
# output.
print x + np.reshape(w, (2, 1))
# Multiply a matrix by a constant:
# x has shape (2, 3). Numpy treats scalars as arrays of shape ();
# these can be broadcast together to shape (2, 3), producing the
# following array:
# [[ 2 4 6]
# [ 8 10 12]]
print x * 2
廣播能夠使程式碼更加簡潔和快速,你應該在可能的地方儘量使用它。
Numpy Documentation
這個簡要的概述涉及了很多你應該瞭解的關於numpy的重要內容,但是遠遠不夠完整。檢視Numpy索引來找到更多的關於Numpy的內容。
Scipy
Numpy提供了一個高效能多維度陣列和計算、操作這些陣列的基本工具。SciPy在此基礎上構建而成,提供大量的在numpy陣列上操作的函式,可以用於不同型別的科學和工程應用。
最好的熟悉SciPy的方式是瀏覽文件。我們會重點強調一些對本課程有用的部分。
Image operations
SciPy提供了一些基本的函式來處理影象。例如,把影象從硬碟讀入numpy陣列,將numpy陣列以影象格式寫到硬碟,重新設定影象的大小。示例如下:
from scipy.misc import imread, imsave, imresize
# Read an JPEG image into a numpy array
img = imread('assets/cat.jpg')
print img.dtype, img.shape # Prints "uint8 (400, 248, 3)"
# We can tint the image by scaling each of the color channels
# by a different scalar constant. The image has shape (400, 248, 3);
# we multiply it by the array [1, 0.95, 0.9] of shape (3,);
# numpy broadcasting means that this leaves the red channel unchanged,
# and multiplies the green and blue channels by 0.95 and 0.9
# respectively.
img_tinted = img * [1, 0.95, 0.9]
# Resize the tinted image to be 300 by 300 pixels.
img_tinted = imresize(img_tinted, (300, 300))
# Write the tinted image back to disk
imsave('assets/cat_tinted.jpg', img_tinted)
--------------------------------------------------------------------------
Left: The original image. Right: The tintedand resized image.
MATLAB files
函式scipy.io.loadmat 和 scipy.io.savemat 允許你讀寫MATLAB檔案,你可以從官方文件中瞭解到它們。
Distance between points
SciPy定義了一些有用的函式來計算點之間的距離。
函式scipy.spatial.distance.pdist 計算給定集合中所有點對的距離。
import numpy as np
from scipy.spatial.distance import pdist, squareform
# Create the following array where each row is a point in 2D space:
# [[0 1]
# [1 0]
# [2 0]]
x = np.array([[0, 1], [1, 0], [2, 0]])
print x
# Compute the Euclidean distance between all rows of x.
# d[i, j] is the Euclidean distance between x[i, :] and x[j, :],
# and d is the following array:
# [[ 0. 1.41421356 2.23606798]
# [ 1.41421356 0. 1. ]
# [ 2.23606798 1. 0. ]]
d = squareform(pdist(x, 'euclidean'))
print d
你可以從官網文件中獲取該函式更多的細節。
一個類似的函式(scipy.spatial.distance.cdist)計算兩個集合中所有點對之間的距離;你可以在這裡閱讀它們。
Matplotlib
Matplotlib是一個繪相簿。這個部分主要介紹matplotlib.pyplot模組,它提供了一個與MATLAB型別的繪圖系統。
Plotting
matplotlib中最重要的函式是plot,它讓我們可以繪製2D資料。這有一個簡單的例子:
import numpy as np
import matplotlib.pyplot as plt
# Compute the x and y coordinates for points on a sine curve
x = np.arange(0, 3 * np.pi, 0.1)
y = np.sin(x)
# Plot the points using matplotlib
plt.plot(x, y)
plt.show() # You must call plt.show() to make graphics appear.
執行程式碼可以獲得下面的圖形。
做一點額外的工作,我們就能一次繪製多條線,新增標題、圖例和座標軸標籤。
import numpy as np
import matplotlib.pyplot as plt
# Compute the x and y coordinates for points on sine and cosine curves
x = np.arange(0, 3 * np.pi, 0.1)
y_sin = np.sin(x)
y_cos = np.cos(x)
# Plot the points using matplotlib
plt.plot(x, y_sin)
plt.plot(x, y_cos)
plt.xlabel('x axis label')
plt.ylabel('y axis label')
plt.title('Sine and Cosine')
plt.legend(['Sine', 'Cosine'])
plt.show()
你可以在官方文件中閱讀到更多的關於plot的內容。
Subplots
可以在同一個圖中繪製不同的東西,需要使用subplot函式。這裡是一個例子:
import numpy as np
import matplotlib.pyplot as plt
# Compute the x and y coordinates for points on sine and cosine curves
x = np.arange(0, 3 * np.pi, 0.1)
y_sin = np.sin(x)
y_cos = np.cos(x)
# Set up a subplot grid that has height 2 and width 1,
# and set the first such subplot as active.
plt.subplot(2, 1, 1)
# Make the first plot
plt.plot(x, y_sin)
plt.title('Sine')
# Set the second subplot as active, and make the second plot.
plt.subplot(2, 1, 2)
plt.plot(x, y_cos)
plt.title('Cosine')
# Show the figure.
plt.show()
-------------------------------------------------------------------------------------------
同樣,官方文件提供了更多的關於subplot的內容。
Images
你可以使用imshow函式來展示圖片。示例如下:
import numpy as np
from scipy.misc import imread, imresize
import matplotlib.pyplot as plt
img = imread('assets/cat.jpg')
img_tinted = img * [1, 0.95, 0.9]
# Show the original image
plt.subplot(1, 2, 1)
plt.imshow(img)
# Show the tinted image
plt.subplot(1, 2, 2)
# A slight gotcha with imshow is that it might give strange results
# if presented with data that is not uint8. To work around this, we
# explicitly cast the image to uint8 before displaying it.
plt.imshow(np.uint8(img_tinted))
plt.show()