Datawhale組隊學習(Pandas) task1-預備知識

阿新 • • 發佈：2020-12-15

1. Python 基礎

1.1 列表推導式與條件賦值

問題1：生成 [0, 2, 4, 6, 8] 樣式的數字序列

寫法1：定義函式+迴圈

def my_func(x):
    return x*2

list = []
for i in range(5):
    list.append(my_func(i))
    
print(list)

寫法2：定義函式+列表推導式

list = [my_func(i) for i in range(5)]
print(list 
)

列表推導式 [* for i in *]，第一個 * 表示對映函式，其輸入為後面 i 指代的內容，第二個 * 表示迭代的物件。

問題2：用多層巢狀列表推導式生成 ['a_c', 'a_d', 'b_c', 'b_d']

[a+'_'+b for a in ['a', 'b'] for b in ['c', 'd']]

問題3：用帶有 if 選擇的條件賦值，即value = a if condition else b 實現截斷列表中超過5的元素，即超過5的用5代替，小於5的保留原來的值

list = [1, 2, 3, 4, 5, 6, 7]
[i if i<=5 else 5 
 for i in list]

1.2 匿名函式與map方法

問題4：利用匿名函式生成 [0, 2, 4, 6, 8] 樣式的數字序列

[(lambda x:x*2)(i) for i in range(5)]

問題5：利用匿名函式和map函式對匿名函式對映，生成 [0, 2, 4, 6, 8] 樣式的數字序列

a = map(lambda x:x*2, range(5))

# 在python2中map（）函式返回的是一個列表
# 但是在python3中返回的是一個迭代器（iteration）

for i in a:
    print(i)  # 0 2 4 6 8

list(map(lambda 
 x: 2*x, range(5)))

問題6：利用匿名函式和map函式對匿名函式對映，生成[‘0_a’, ‘1_b’, ‘2_c’, ‘3_d’, ‘4_e’]

# list函式生成列表list('abcd') >>> ['a', 'b', 'c', 'd']
list(map(lambda x,y: str(x)+'_'+y, range(5), list('abcde')))

map() 根據提供的函式對指定序列做對映，語法：map(function, iterable, ...)

注：在做這道題時，總是報錯’list’ object is not callable，後來才發現原來是做上面題時我命名了list變數，以後要注意避免和函式名、方法名和關鍵詞重複。不是第一次犯這種錯誤了 = =
【1】變數和函式同時使用了list，導致丟擲異常

1.3 zip物件與enumerate方法

問題7：用zip打包多個可迭代物件

L1 = list('abcde')
L2 = range(4)

for a, b in zip(L1, L2):
    print(a, b)

# 元素個數與最短的列表一致
>>> 
a 0
b 1
c 2
d 3

問題8：用 enumerate 打包一個可迭代物件並繫結迭代元素的遍歷序號

L = list('abcd')
for index, value in enumerate(L):
    print(index, value)

enumerate() 函式用於將一個可遍歷的資料物件(如列表、元組或字串)組合為一個索引序列，同時列出資料和資料下標。語法：enumerate(sequence, [start=0])，sequence 表示一個序列、迭代器或其他支援迭代物件，[start=0] 表示下標起始位置。

問題9：用 zip 壓縮 L1 和 L2，並解壓

zipped = list(zip(L1, L2))
zipped
>>> [('a', 0), ('b', 1), ('c', 2), ('d', 3)]

list(zip(*zipped))
>>> [('a', 'b', 'c', 'd'), (0, 1, 2, 3)]

問題10：用 zip 建立 L1 和 L2 的字典對映

dict(zip(L1, L2))
>>> {'a': 0, 'b': 1, 'c': 2, 'd': 3}

2. Numpy基礎

2.1 np陣列的構造

問題11-1：用 array 構造 [1, 2, 3]

np.array([1,2,3])
>>> array([1, 2, 3])

問題11-2：利用 np.linspace, np.arange 構造等差序列

np.linspace(1,5,11) # [起始, 終止], 樣本個數
>>> array([1, 2, 3])

np.arange(1,5,2) # [起始, 終止), 步長 
>>> array([1, 3])

問題11-3：構造偏移主對角線1個單位的偽單位矩陣

np.array([1,2,3])
>>> array([1, 2, 3])

問題11-4：利用 np.random 生成隨機矩陣

# （1）生成服從0-1均勻分佈的三個隨機數
np.random.rand(3)
# （2）生成元素服從0-1均勻分佈的3*3陣列
np.random.rand(3,3)
# (3)生成服從a-b均勻分佈的三個隨機數
a, b = 5, 15
(b-a) * np.random.rand(3) + a
=======================
# (1)生成服從N(0,1)標準正態分佈的三個隨機數
np.random.randn(3)
# (2)生成服從方差σ^2, 均值μ一元正態分佈的三個隨機數
sigma, mu = 2.5, 3
mu + np.random.randn(3) * sigma
=======================
# (1)生成隨機整數 [low, high)
low, high, size = 5, 15, (2,2)
np.random.randint(low, high, size)
>>> array([[11,  8],
       [14, 10]])
=======================
# (1)以給定概率取樣
my_list = ['a', 'b', 'c', 'd']
np.random.choice(my_list, 2, replace=False, p=[0.1, 0.7, 0.1, 0.1])
>>> array(['c', 'd'], dtype='<U1')
# (2)當不指定概率時則為均勻取樣，預設有放回抽樣
np.random.choice(my_list, (3,3))
>>> array([['c', 'c', 'c'],
       ['c', 'a', 'd'],
       ['c', 'a', 'c']], dtype='<U1')
=======================
# random.permutation(x) 隨機排列一個數組
np.random.permutation(my_list)
>>> array(['d', 'b', 'c', 'a'], dtype='<U1')

arr = np.arange(9).reshape((3,3))
>>> array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])
np.random.permutation(arr)
# 如果x是多維陣列，則沿其第一個座標軸的索引隨機排列陣列
>>> array([[3, 4, 5],
       [6, 7, 8],
       [0, 1, 2]])