【python資料處理】pandas行列操作及聚合
阿新 • • 發佈:2018-12-14
1.列操作 apply
df.coulumn.function() (df.count.mean()這種)
例子:
將Name列全部大寫
from string import upper
df['Name'] = df.Name.apply(upper)
用lambda操作列
例子:建立一列email的供應商
df['Email Provider'] = df.Email.apply(
lambda x: x.split('@')[-1]
)
2.行操作 lambda
if前一行結尾\ if結尾加\ 記得要axis=1
在使用lambda操作行的時候只要不加列名就是操作行
比如列操作( df.Email.apply)而行操作(df.apply)
則使用行操作 記得要axis=1
一個簡單的判斷方法是列操作只操作自己這列,行操作一般要用好幾列的資料
例子1: 40小時以下和40小時以上不同薪,計算出每個人總薪
import codecademylib import pandas as pd df = pd.read_csv('employees.csv') total_earned = lambda row: (row.hourly_wage * 40) + ((row.hourly_wage * 1.5) * (row.hours_worked - 40)) \ if row.hours_worked > 40 \ else row.hourly_wage * row.hours_worked df['total_earned'] = df.apply(total_earned, axis = 1) print(df)
例子2 分別進行列操作和行操作
import codecademylib import pandas as pd orders = pd.read_csv('shoefly.csv') print(orders.head(5)) #列 source=lambda x:'animal' \ if (x=='leather')\ else 'vegan' orders['shoe_source']=orders.shoe_material.apply(source) print(orders.head(5)) #行 get_lastname=lambda row:'Dear Mr. '+row.last_name\ if row.gender=='male'\ else 'Dear Ms. '+row.last_name orders['salutation']=orders.apply(get_lastname,axis=1) print(orders.head(5))
例子3
import codecademylib
import pandas as pd
inventory=pd.read_csv('inventory.csv')
print(inventory.head(10))
staten_island=inventory[0:10]
product_request=staten_island.product_description
print(inventory.info())
seed_request=inventory[(inventory.product_type=='seeds')&(inventory.location=='Brooklyn')]
print(seed_request)
inventory['in_stock']=inventory.quantity.apply(lambda x:False \
if(x==0)\
else True
)
#print(inventory.head(10))
inventory['total_value']=inventory.apply(lambda row:row.quantity*row.price,axis=1)
#print(inventory.head(10))
combine_lambda = lambda row: \
'{} - {}'.format(row.product_type,
row.product_description)
inventory['full_description']=inventory.apply(combine_lambda,axis=1)
print(inventory.head(10))
3.Aggregates in Pandas 聚集
1.已經可以使用apply對每個value操作了,這一節主要是如何把一整個column的value操作得到一個值 用法一般是df.column.command
例子:cuisine_options_count=restaurants['cuisine'].nunique() 統計有多少種cuisine
|
Average of all values in column |
|
Standard deviation |
|
Median |
|
Maximum value in column |
|
Minimum value in column |
|
Number of values in column |
|
Number of unique values in column |
|
List of unique values in column |
2. df.groupby('column1').column2.measurement().reset_index()
column1是你想同值合併的,column2是你進行函式操作的列,measurement()是想apply的方法 注意:得到的型別是Series
例子1.:
得到每種鞋型的最高價
orders = pd.read_csv('orders.csv')
pricey_shoes=orders.groupby('shoe_type').price.max()
因為上一種方法得到的是series型別,索引不是index,想轉變成dataframe形式,使用reset_index()方法,一般groupby()後用
例子2: 這時型別是dataframe
pricey_shoes = orders.groupby('shoe_type').price.max().reset_index()
print(pricey_shoes)
如果簡單的函式無法達到要求 再次引入apply(lambda 函式)
例子3: 返回每種顏色的鞋子價格列表中25%處的價格
import codecademylib
import numpy as np
import pandas as pd
orders = pd.read_csv('orders.csv')
print(orders)
cheap_shoes=orders.groupby('shoe_color').price.apply(lambda x:np.percentile(x,25))
print(cheap_shoes)
有時想要groupby多列
例子4:統計 擁有相同鞋型和鞋色的鞋子的訂單量
import codecademylib
import numpy as np
import pandas as pd
orders = pd.read_csv('orders.csv')
shoe_counts=orders.groupby(['shoe_type','shoe_color']).id.count().reset_index()
print(shoe_counts)
shoe_counts.rename(columns={'id': 'count'}, inplace=True)
#shoe_counts.columns = ['shoe_type', 'shoe_color','count']
print(shoe_counts)
3.改變表的形態 privot 和使用groupby一樣也要reset_index
例子:
import codecademylib
import numpy as np
import pandas as pd
orders = pd.read_csv('orders.csv')
shoe_counts = orders.groupby(['shoe_type', 'shoe_color']).id.count().reset_index()
print(shoe_counts)
shoe_counts.rename(columns={'id': 'count'}, inplace=True)
shoe_counts_pivot=shoe_counts.pivot(columns='shoe_color',index='shoe_type',values='count').reset_index()
print(shoe_counts_pivot)
shoe_type | shoe_color | ||
---|---|---|---|
0 | ballet flats | black | 2 |
1 | ballet flats | brown | 11 |
2 | ballet flats | navy | 17 |
3 | ballet flats | red | 13 |
4 | ballet flats | white | 7 |
5 | sandals | black | 3 |
6 | sandals | brown | 10 |
7 | sandals | navy | 13 |
8 | sandals | red | 14 |
9 | sandals | white | 10 |
10 | stilettos | black | 8 |