pandas縱向學習之10 minutes to pandas(二)
阿新 • • 發佈:2018-11-14
布林值索引
df[df.A>0]
A B C D
2013-01-02 0.356680 -0.468280 1.293093 -0.752251
2013-01-03 1.179930 0.407866 -1.733382 -0.128474
2013-01-05 1.398427 0.087443 -1.032773 0.809215
df[df>0]
A B C D
2013-01-01 NaN NaN 1.057780 NaN
2013-01-02 0.356680 NaN 1.293093 NaN
2013-01-03 1.179930 0.407866 NaN NaN
2013-01-04 NaN NaN NaN 0.907222
2013-01-05 1.398427 0.087443 NaN 0.809215
2013-01-06 NaN NaN NaN 0.899263
利用isin函式篩選,這個比較好用。注意傳入的值是一個列表:
df2 = df.copy()
df2['E'] = ['one', 'one','two','three','four','three']
df2
A B C D E
2013-01-01 -0.119951 -1.662543 1.057780 -0.126012 one
2013-01-02 0.356680 -0.468280 1.293093 -0.752251 one
2013-01-03 1.179930 0.407866 -1.733382 -0.128474 two
2013-01-04 -0.503068 -1.408777 -0.380794 0.907222 three
2013-01-05 1.398427 0.087443 -1.032773 0.809215 four
2013-01-06 -1.068830 -0.963702 -0.964578 0.899263 three
df2[df2['E'].isin(['one', 'four'])]
A B C D E
2013-01-01 -0.119951 -1.662543 1.057780 -0.126012 one
2013-01-02 0.356680 -0.468280 1.293093 -0.752251 one
2013-01-05 1.398427 0.087443 -1.032773 0.809215 four
設定
通過對齊index使新列加入,對不齊的成為NaN:
s1 = pd.Series([1,2,3,4,5,6], index=pd.date_range('20130102', periods=6))
s1
2013-01-02 1
2013-01-03 2
2013-01-04 3
2013-01-05 4
2013-01-06 5
2013-01-07 6
Freq: D, dtype: int64
df['F'] = s1
df
A B C D F
2013-01-01 -0.119951 -1.662543 1.057780 -0.126012 NaN
2013-01-02 0.356680 -0.468280 1.293093 -0.752251 1.0
2013-01-03 1.179930 0.407866 -1.733382 -0.128474 2.0
2013-01-04 -0.503068 -1.408777 -0.380794 0.907222 3.0
2013-01-05 1.398427 0.087443 -1.032773 0.809215 4.0
2013-01-06 -1.068830 -0.963702 -0.964578 0.899263 5.0
生成新列:
df.loc[:,'D'] = np.array(5)*len(df['D'])
df
A B C D F
2013-01-01 -0.119951 -1.662543 1.057780 30 NaN
2013-01-02 0.356680 -0.468280 1.293093 30 1.0
2013-01-03 1.179930 0.407866 -1.733382 30 2.0
2013-01-04 -0.503068 -1.408777 -0.380794 30 3.0
2013-01-05 1.398427 0.087443 -1.032773 30 4.0
2013-01-06 -1.068830 -0.963702 -0.964578 30 5.0
df.loc[:,'D'] = np.array([5] * len(df))
df
A B C D F
2013-01-01 -0.119951 -1.662543 1.057780 5 NaN
2013-01-02 0.356680 -0.468280 1.293093 5 1.0
2013-01-03 1.179930 0.407866 -1.733382 5 2.0
2013-01-04 -0.503068 -1.408777 -0.380794 5 3.0
2013-01-05 1.398427 0.087443 -1.032773 5 4.0
2013-01-06 -1.068830 -0.963702 -0.964578 5 5.0
使dataframe全為負,同理可實現全為正:
df[df>0] = -df
df
A B C D F
2013-01-01 -0.119951 -1.662543 -1.057780 -5 NaN
2013-01-02 -0.356680 -0.468280 -1.293093 -5 -1.0
2013-01-03 -1.179930 -0.407866 -1.733382 -5 -2.0
2013-01-04 -0.503068 -1.408777 -0.380794 -5 -3.0
2013-01-05 -1.398427 -0.087443 -1.032773 -5 -4.0
2013-01-06 -1.068830 -0.963702 -0.964578 -5 -5.0
缺失值處理
注:dropna的how引數預設為any;dillna要用value不用加s
df1.dropna(how='any')
A B C D F E
2013-01-02 -0.35668 -0.46828 -1.293093 -5 -1.0 1.0
df1.fillna(value=5)
A B C D F E
2013-01-01 -0.119951 -1.662543 -1.057780 -5 5.0 1.0
2013-01-02 -0.356680 -0.468280 -1.293093 -5 -1.0 1.0
2013-01-03 -1.179930 -0.407866 -1.733382 -5 -2.0 5.0
2013-01-04 -0.503068 -1.408777 -0.380794 -5 -3.0 5.0
pd.isna(df1)
A B C D F E
2013-01-01 False False False False True False
2013-01-02 False False False False False False
2013-01-03 False False False False False True
2013-01-04 False False False False False True