1. 程式人生 > 實用技巧 >3.5.2 索引

3.5.2 索引

1.匯入三方庫

1 import numpy as np
2 import pandas as pd
3 df = pd.read_csv('table.csv',index_col='ID') #用來指定表格的索引值
4 5 df.head(2)

SchoolClassGenderAddressHeightWeightMathPhysics
ID
1101 S_1 C_1 M street_1 173 63 34.0 A+
1102 S_1 C_1 F street_2 192 73 32.5 B+

2.索引

1)loc:標籤索引;遵循左閉右閉

a)單行索引

1 df.loc[1103]
School          S_1
Class           C_1
Gender            M
Address    street_2
Height          186
Weight           82
Math           87.2
Physics          B+
Name: 1103, dtype: object

b)多行索引

1 df.loc[[1101,1105,1204,1301]]

SchoolClassGenderAddressHeightWeightMathPhysics
ID
1101 S_1 C_1 M street_1 173 63 34.0 A+
1105 S_1 C_1 F street_4 159 64 84.8 B+
1204 S_1 C_2 F street_5 162 63 33.8 B
1301 S_1 C_3 M street_4 161 68 31.5 B+

1 df.loc[1103:1203]

SchoolClassGenderAddressHeightWeightMathPhysics
ID
1103 S_1 C_1 M street_2 186 82 87.2 B+
1104 S_1 C_1 F street_2 167 81 80.4 B-
1105 S_1 C_1 F street_4 159 64 84.8 B+
1201 S_1 C_2 M street_5 188 68 97.0 A-
1202 S_1 C_2 F street_4 176 94 63.5 B-
1203 S_1 C_2 M street_6 160 53 58.8 A+

c)單列索引

1 df.loc[:,'Weight'].head(3)
ID
1101    63
1102    73
1103    82
Name: Weight, dtype: int64

d)多列索引

1 df.loc[:,['Address','Height','Math']].head()

AddressHeightMath
ID
1101 street_1 173 34.0
1102 street_2 192 32.5
1103 street_2 186 87.2
1104 street_2 167 80.4
1105 street_4 159 84.8

d)綜合索引

1 df.loc[1102:2301,['Address','Height','Math']].head()

AddressHeightMath
ID
1102 street_2 192 32.5
1103 street_2 186 87.2
1104 street_2 167 80.4
1105 street_4 159 84.8
1201 street_5 188 97.0

2)iloc:位置索引;遵循左閉右開

a)單行索引

1 df.head(9)

SchoolClassGenderAddressHeightWeightMathPhysics
ID
1101 S_1 C_1 M street_1 173 63 34.0 A+
1102 S_1 C_1 F street_2 192 73 32.5 B+
1103 S_1 C_1 M street_2 186 82 87.2 B+
1104 S_1 C_1 F street_2 167 81 80.4 B-
1105 S_1 C_1 F street_4 159 64 84.8 B+
1201 S_1 C_2 M street_5 188 68 97.0 A-
1202 S_1 C_2 F street_4 176 94 63.5 B-
1203 S_1 C_2 M street_6 160 53 58.8 A+
1204 S_1 C_2 F street_5 162 63 33.8 B

1 df.iloc[2]

School          S_1
Class           C_1
Gender            M
Address    street_2
Height          186
Weight           82
Math           87.2
Physics          B+
Name: 1103, dtype: object

b)多行索引

1 df.iloc[2:6]
SchoolClassGenderAddressHeightWeightMathPhysics
ID
1103 S_1 C_1 M street_2 186 82 87.2 B+
1104 S_1 C_1 F street_2 167 81 80.4 B-
1105 S_1 C_1 F street_4 159 64 84.8 B+
1201 S_1 C_2 M street_5 188 68 97.0 A-

c)單例索引

1 df.iloc[:,4].head(3)

ID
1101    173
1102    192
1103    186
Name: Height, dtype: int64

d)多列索引

1 df.iloc[:,7::-2].head(3)

PhysicsWeightAddressClass
ID
1101 A+ 63 street_1 C_1
1102 B+ 73 street_2 C_1
1103 B+ 82 street_2 C_1

e)綜合索引

1 df.iloc[2:6,7::-2].head(3)

PhysicsWeightAddressClass
ID
1103 B+ 82 street_2 C_1
1104 B- 81 street_2 C_1
1105 B+ 64 street_4 C_1

3.常用索引函式

a)where函式 對條件為False的單元進行填充

1 df.head()

SchoolClassGenderAddressHeightWeightMathPhysics
ID
1101 S_1 C_1 M street_1 173 63 34.0 A+
1102 S_1 C_1 F street_2 192 73 32.5 B+
1103 S_1 C_1 M street_2 186 82 87.2 B+
1104 S_1 C_1 F street_2 167 81 80.4 B-
1105 S_1 C_1 F street_4 159 64 84.8 B+
1 df['Gender'].unique()
2 array(['M', 'F'], dtype=object)
3 df.where(df['Gender']=='M').head()

SchoolClassGenderAddressHeightWeightMathPhysics
ID
1101 S_1 C_1 M street_1 173.0 63.0 34.0 A+
1102 NaN NaN NaN NaN NaN NaN NaN NaN
1103 S_1 C_1 M street_2 186.0 82.0 87.2 B+
1104 NaN NaN NaN NaN NaN NaN NaN NaN
1105 NaN NaN NaN NaN NaN NaN NaN NaN

1 aa = df.where(df['Gender']=='M').dropna().head()
2 #意思是:在通過以上的操作,刪除掉單元格中不滿足條件的行,或提取出篩選後的新陣列
3 #mask對條件為True的單元進行填充
4 aa

SchoolClassGenderAddressHeightWeightMathPhysics
ID
1101 S_1 C_1 M street_1 173.0 63.0 34.0 A+
1103 S_1 C_1 M street_2 186.0 82.0 87.2 B+
1201 S_1 C_2 M street_5 188.0 68.0 97.0 A-
1203 S_1 C_2 M street_6 160.0 53.0 58.8 A+
1301 S_1 C_3 M street_4 161.0 68.0 31.5 B+