1. 程式人生 > >pandas縱向學習之10 minutes to pandas(二)

pandas縱向學習之10 minutes to pandas(二)

布林值索引

df[df.A>0]
	A	B	C	D
2013-01-02	0.356680	-0.468280	1.293093	-0.752251
2013-01-03	1.179930	0.407866	-1.733382	-0.128474
2013-01-05	1.398427	0.087443	-1.032773	0.809215
df[df>0]
	A	B	C	D
2013-01-01	NaN	NaN	1.057780	NaN
2013-01-02	0.356680	NaN	1.293093	NaN
2013-01-03	1.179930	0.407866
NaN NaN 2013-01-04 NaN NaN NaN 0.907222 2013-01-05 1.398427 0.087443 NaN 0.809215 2013-01-06 NaN NaN NaN 0.899263

利用isin函式篩選,這個比較好用。注意傳入的值是一個列表:

df2 = df.copy()
df2['E'] = ['one', 'one','two','three','four','three']
df2
	A	B	C	D	E
2013-01-01	-0.119951	-1.662543	1.057780	-0.126012	one
2013-01-02	0.356680	-0.468280	1.293093	-0.752251
one 2013-01-03 1.179930 0.407866 -1.733382 -0.128474 two 2013-01-04 -0.503068 -1.408777 -0.380794 0.907222 three 2013-01-05 1.398427 0.087443 -1.032773 0.809215 four 2013-01-06 -1.068830 -0.963702 -0.964578 0.899263 three df2[df2['E'].isin(['one', 'four'])] A B C D E 2013-01-01 -0.119951 -1.662543 1.057780 -0.126012 one 2013-01-02 0.356680 -0.468280
1.293093 -0.752251 one 2013-01-05 1.398427 0.087443 -1.032773 0.809215 four

設定

通過對齊index使新列加入,對不齊的成為NaN:

s1 = pd.Series([1,2,3,4,5,6], index=pd.date_range('20130102', periods=6))
s1
2013-01-02    1
2013-01-03    2
2013-01-04    3
2013-01-05    4
2013-01-06    5
2013-01-07    6
Freq: D, dtype: int64
df['F'] = s1
df
	A	B	C	D	F
2013-01-01	-0.119951	-1.662543	1.057780	-0.126012	NaN
2013-01-02	0.356680	-0.468280	1.293093	-0.752251	1.0
2013-01-03	1.179930	0.407866	-1.733382	-0.128474	2.0
2013-01-04	-0.503068	-1.408777	-0.380794	0.907222	3.0
2013-01-05	1.398427	0.087443	-1.032773	0.809215	4.0
2013-01-06	-1.068830	-0.963702	-0.964578	0.899263	5.0

生成新列:

df.loc[:,'D'] = np.array(5)*len(df['D'])
df
A	B	C	D	F
2013-01-01	-0.119951	-1.662543	1.057780	30	NaN
2013-01-02	0.356680	-0.468280	1.293093	30	1.0
2013-01-03	1.179930	0.407866	-1.733382	30	2.0
2013-01-04	-0.503068	-1.408777	-0.380794	30	3.0
2013-01-05	1.398427	0.087443	-1.032773	30	4.0
2013-01-06	-1.068830	-0.963702	-0.964578	30	5.0
df.loc[:,'D'] = np.array([5] * len(df))
df
A	B	C	D	F
2013-01-01	-0.119951	-1.662543	1.057780	5	NaN
2013-01-02	0.356680	-0.468280	1.293093	5	1.0
2013-01-03	1.179930	0.407866	-1.733382	5	2.0
2013-01-04	-0.503068	-1.408777	-0.380794	5	3.0
2013-01-05	1.398427	0.087443	-1.032773	5	4.0
2013-01-06	-1.068830	-0.963702	-0.964578	5	5.0

使dataframe全為負,同理可實現全為正:

df[df>0] = -df
df
A	B	C	D	F
2013-01-01	-0.119951	-1.662543	-1.057780	-5	NaN
2013-01-02	-0.356680	-0.468280	-1.293093	-5	-1.0
2013-01-03	-1.179930	-0.407866	-1.733382	-5	-2.0
2013-01-04	-0.503068	-1.408777	-0.380794	-5	-3.0
2013-01-05	-1.398427	-0.087443	-1.032773	-5	-4.0
2013-01-06	-1.068830	-0.963702	-0.964578	-5	-5.0

缺失值處理

注:dropna的how引數預設為any;dillna要用value不用加s

df1.dropna(how='any')
A	B	C	D	F	E
2013-01-02	-0.35668	-0.46828	-1.293093	-5	-1.0	1.0
df1.fillna(value=5)
A	B	C	D	F	E
2013-01-01	-0.119951	-1.662543	-1.057780	-5	5.0	1.0
2013-01-02	-0.356680	-0.468280	-1.293093	-5	-1.0	1.0
2013-01-03	-1.179930	-0.407866	-1.733382	-5	-2.0	5.0
2013-01-04	-0.503068	-1.408777	-0.380794	-5	-3.0	5.0
pd.isna(df1)
	A	B	C	D	F	E
2013-01-01	False	False	False	False	True	False
2013-01-02	False	False	False	False	False	False
2013-01-03	False	False	False	False	False	True
2013-01-04	False	False	False	False	False	True