資料分析之pandas知識梳理

阿新 • • 發佈：2018-11-09

Series及DataFrame部分知識梳理

一、Series索引與切片

首先匯入pandas和Series

 import pandas as pd
 from pandas import Series

顯式索引：

使用index中的元素作為索引值

使用.loc[‘索引名’]（推薦）

s1 = Series(data=[1,2,3,4,5,6],index=list('abcdef'))
s1
 a    1
 b    2
 c    3
 d    4
 e    5
 f    6
 dtype: int64
 s1.loc[['a','c']]  # 同一個維度 取多個值 要用中括號括起來
 a    1
 c    3
 dtype: int64
 s1.loc['b':'e':2]  # 也可以跳著取
 b    2
 d    4
 dtype: int64
 s1.loc['e':'b':-1]  # 注意 如果想倒著取 前面切片的屬性 也得是倒著的
 e    5
 d    4
 c    3
 b    2
 dtype: int64

隱式索引：

使用整數作為索引值

使用.iloc[ 索引號 ]（推薦)

 s1
 a    1
 b    2
 c    3
 d    4
 e    5
 f    6
 dtype: int64
 #整數陣列形式的索引 通過iloc同樣可以使用
 s1.iloc[[3,2,1,0]]
 d    4
 c    3
 b    2
 a    1
 dtype: int64
 s1.iloc[0:3]  # 顯示索引 切片的時候是 包括最後一個的 隱式索引 不包括最後一個
 a    1
 b    2
 c    3
 dtype: int64

二、Series之間的運算

在運算中自動對齊不同索引的資料
如果索引不對應，則補NaN(值和NaN相加的結果還是NaN，如果想要讓NaN的值當作0處理，可以用s1.add(s2,fill_value=0)來處理)

三、 Series與DataFrame之間的運算

axis=0：以列為單位操作（引數必須是列），對所有列都有效。

axis=1：以行為單位操作（引數必須是行），對所有行都有效。

df = DataFrame(data=np.random.randint(0,10,size=(5,5)),index=list('abcde'),columns=list('01234'))
df
	0	1	2	3	4
a	3	9	0	3	8
b	8	6	2	3	0
c	2	2	6	7	7
d	6	7	1	3	1
e	1	8	7	9	6
s1 = Series(data=np.random.randint(0,10,size=5),index=list('01234'))
s1
0    1
1    3
2    1
3    1
4    9
dtype: int32
df+s1  # 表格和序列 相加 預設 每一行都要和序列相加 對應項相加
	0	1	2	3	4
a	4	12	1	4	17
b	9	9	3	4	9
c	3	5	7	8	16
d	7	10	2	4	10
e	2	11	8	10	15
s2 = Series(data=np.random.randint(0,10,size=5),index=list('abcde'))
s2
a    4
b    8
c    5
d    4
e    6
dtype: int32
df+s2

	0	1	2	3	4	a	b	c	d	e
a	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
b	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
c	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
d	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
e	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
# axis='columns' 預設是columns 每一行和Series相加 讓列名和Series中的索引去對應
df.add(s2,axis='index')
	0	1	2	3	4
a	7	13	4	7	12
b	16	14	10	11	8
c	7	7	11	12	12
d	10	11	5	7	5
e	7	14	13	15	12

四、使用pd.concat()級聯

pd.concat(objs, axis=0, join=‘outer’, join_axes=None, ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=None, copy=True)

對於普通級聯主要看 objs ignore_index axis 三個引數

import pandas as pd
from pandas import DataFrame,Series
df1
	0	1	2
0	4	9	3
1	2	3	8
2	4	8	3
df2
	0	1	2
0	2	5	8
1	7	6	2
2	6	6	7
pd.concat([df1,df2])
	0	1	2
0	4	9	3
1	2	3	8
2	4	8	3
0	2	5	8
1	7	6	2
2	6	6	7
# ignore_index 忽略行索引
pd.concat([df1,df2],ignore_index=True)
	0	1	2
0	4	9	3
1	2	3	8
2	4	8	3
3	2	5	8
4	7	6	2
5	6	6	7
pd.concat([df1,df2],axis=1)  # axis控制方向
	0	1	2	0	1	2
0	4	9	3	2	5	8
1	2	3	8	7	6	2
2	4	8	3	6	6	7

對於不匹配級聯主要看 join keys join_axes 三個引數

df3
	a	b	c
0	4	9	3
1	2	3	8
2	4	8	3
df4
	b	c	d
0	2	5	8
1	7	6	2
2	6	6	7
pd.concat([df3,df4])

	a	b	c	d
0	4.0	9	3	NaN
1	2.0	3	8	NaN
2	4.0	8	3	NaN
0	NaN	2	5	8.0
1	NaN	7	6	2.0
2	NaN	6	6	7.0

sort=True/False避免報警告
pd.concat([df3,df4],sort=True,join=‘outer’) 外聯（並集）保留兩個表格都有的列
pd.concat([df3,df4],sort=True,join=‘inner’) 內聯（取並集）
pd.concat([df3,df4],sort=True,join=‘left’) # 這裡沒有左聯和右聯
pd.concat([df3,df4],sort=True,join_axes=[df3.columns])
pd.concat([df3,df4],sort=True,join_axes=[df4.columns]) # 指定保留哪些列
pd.concat([df3,df4],sort=True,keys=[‘A’,‘B’]) # 新增多重索引用來區分同樣的行

五、pd.merge()合併

merge與concat的區別在於，merge需要依據某一共同的行或列來進行資料的融合

 pd.merge(left, right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False, validate=None)

一對一合併

 t1 = pd.read_excel('../資料分析/day3/03_dataframe/03_dataframe/backup/data/demo.xls',sheet_name=0)
 t2 = pd.read_excel('../資料分析/day3/03_dataframe/03_dataframe/backup/data/demo.xls',sheet_name=1)
 t3 = pd.read_excel('../資料分析/day3/03_dataframe/03_dataframe/backup/data/demo.xls',sheet_name=2)
 t4 = pd.read_excel('../資料分析/day3/03_dataframe/03_dataframe/backup/data/demo.xls',sheet_name=3)
 
 display(t1,t2)
 	手機型號			參考價格
 0	windowsPhone	2500
 1	iPhone			7500
 2	Android			4000
 	手機型號			重量
 0	windowsPhone	0.50
 1	iPhone			0.40
 2	Android			0.45
 3	other			0.
 pd.merge(t1,t2)
 	手機型號			參考價格	重量
 0	windowsPhone	2500	0.50
 1	iPhone			7500	0.40
 2	Android			4000	0.45

多對一合併

 display(t2,t3)
 	手機型號			重量
 0	windowsPhone	0.50
 1	iPhone			0.40
 2	Android			0.45
 3	other			0.60
 	經銷商	發貨地區		手機型號
 0	pegge	beijing		iPhone
 1	lucy	beijing		Android
 2	tom		guangzhou	iPhone
 3	petter	shenzhen	windowsPhone
 4	mery	guangzhou	Android
 # pd.merge(t2,t3,how='inner')  # 內聯 取交集 大家都有的才留下
 # pd.merge(t2,t3,how='outer')  # 外聯 取並集 不管誰有 都留下
 # pd.merge(t2,t3,how='left')  # 左聯 按照左邊的集合 保留內容
 pd.merge(t2,t3,how='right')
 	手機型號			重量		經銷商	發貨地區
 0	windowsPhone	0.50	petter	shenzhen
 1	iPhone			0.40	pegge	beijing
 2	iPhone			0.40	tom	guangzhou
 3	Android			0.45	lucy	beijing
 4	Android			0.45	mery	guangzhou

多對多合併

 display(t3,t4)	
 	經銷商	發貨地區		手機型號
 0	pegge	beijing		iPhone
 1	lucy	beijing		Android
 2	tom		guangzhou	iPhone
 3	petter	shenzhen	windowsPhone
 4	mery	guangzhou	Android
 	發貨地區		手機型號			價格
 0	beijing		iPhone			7000
 1	beijing		windowsPhone	2300
 2	beijing		Android			3600
 3	guangzhou	iPhone			7600
 4	guangzhou	windowsPhone	2800
 5	guangzhou	Android			4200
 6	shenzhen	iPhone			7400
 7	shenzhen	windowsPhone	2750
 8	shenzhen	Android			3900
 # 通過引數 on可以指定 匹配的列 預設列名一樣的都去匹配
 # suffixes 字尾
 # pd.merge(t3,t4,on='手機型號')
 # pd.merge(t3,t4,on='手機型號')
 pd.merge(t3,t4,on='手機型號',suffixes=('_上半年','_下半年'))
 
 	經銷商	發貨地區_上半年	手機型號			發貨地區_下半年	價格
 0	pegge	beijing			iPhone			beijing			7000
 1	pegge	beijing			iPhone			guangzhou		7600
 2	pegge	beijing			iPhone			shenzhen		7400
 3	tom		guangzhou		iPhone			beijing			7000
 4	tom		guangzhou		iPhone			guangzhou		7600
 5	tom		guangzhou		iPhone			shenzhen		7400
 6	lucy	beijing			Android			beijing			3600
 7	lucy	beijing			Android			guangzhou		4200
 8	lucy	beijing			Android			shenzhen		3900
 9	mery	guangzhou		Android			beijing			3600
 10	mery	guangzhou		Android			guangzhou		4200
 11	mery	guangzhou		Android			shenzhen		3900
 12	petter	shenzhen		windowsPhone	beijing			2300
 13	petter	shenzhen		windowsPhone	guangzhou		2800
 14	petter	shenzhen		windowsPhone	shenzhen		2750

六、Series、DataFrame索引和切片

1）Series的操作

s1 = Series([100,90,80,70,60,50],index=pd.MultiIndex.from_product([['期中','期末'],['語','數','外']]))
s1	
期中  語    100
	 數     90
	 外     80
期末  語     70
     數     60
     外     50
dtype: int64		
s1.loc['期末','語']
70
s1.loc[:]
#s1.loc['期中':'期末']
#s1.loc['語':'數']
# s1.loc[:,'語':'數']  # 注意 內層的切片 不能直接切
s1.loc['期中'].loc['語':'數']		
語    100
數     90
dtype: int64		
# 雖然 索引的層次多了 但是編號 還是 0 1 2 .. 按順序往後排列的
s1.iloc[0:5]		
期中  語    100
	 數     90
	 外     80
期末  語     70
     數     60
dtype: int64

2）DataFrame的操作

indexes = pd.MultiIndex.from_product([['期中','期末'],['語','數','外']])
columns = pd.MultiIndex.from_product([['一班','二班'],['01','02','03']])
data = np.random.randint(0,150,size=(6,6))
df1 = DataFrame(data,index=indexes,columns=columns)
df1		
			一班			二班
		01	02	03	01	02	03
期中	語	57	93	125	13	22	22
	數	34	22	0	53	142	25
	外	66	73	70	16	46	54
期末	語	17	97	100	128	123	146
	數	48	78	121	103	69	52
	外	146	37	46	109	47	30
df1['一班','01']
期中  語     57
	 數     34
 	 外     66
期末  語     17
     數     48
     外    146
Name: (一班, 01), dtype: int32
# df1.loc['期中','一班','語','01']  # 多重行索引也可以 行和列的索引也可以 但是 混合起來就不行了
df1.values[0,0]
df1.iloc[0,0]
57
df1.iloc[:,1:5]
			一班		二班
		02	03	01	02
期中	語	93	125	13	22
	數	22	0	53	142
	外	73	70	16	46
期末	語	97	100	128	123
	數	78	121	103	69
	外	37	46	109	47

資料分析之pandas知識梳理

Series及DataFrame部分知識梳理

一、Series索引與切片

二、Series之間的運算

三、 Series與DataFrame之間的運算

四、使用pd.concat()級聯

五、pd.merge()合併

六、Series、DataFrame索引和切片

資料分析之pandas知識梳理

資料分析之pandas入門

Python資料分析之pandas資料視覺化 python

資料分析之pandas計算A股節日效應持續更新【內向即完敗--王奕君】

Python資料分析之pandas入門

Python資料分析之pandas學習（二）

Python資料分析之pandas統計分析

資料分析之Pandas——資料結構

（轉載）Python資料分析之pandas學習

Python資料分析之pandas學習

python資料分析之pandas學習一

Python資料分析之pandas基本資料結構：Series、DataFrame

資料分析之numpy常用知識點、難點梳理

Python Pandas 做資料分析之玩轉 Excel 報表分析

python 資料分析之用pandas和seaborn繪圖

python資料分析之（3）pandas

數據分析之pandas教程-----概念篇

「機器學習」Python資料分析之Numpy進階

「機器學習」Python資料分析之Numpy

資料分析工具pandas簡介

資料分析之pandas知識梳理

Series及DataFrame部分知識梳理

一、Series索引與切片

二、Series之間的運算

三、 Series與DataFrame之間的運算

四、使用pd.concat()級聯

五、pd.merge()合併

六、Series、DataFrame索引和切片

相關推薦