Pandas文摘:Applying Operations Over pandas Dataframes
阿新 • • 發佈:2018-12-16
原文地址:https://chrisalbon.com/python/data_wrangling/pandas_apply_operations_to_dataframes/
Applying Operations Over pandas Dataframes
20 Dec 2017Import Modules
import pandas as pd
import numpy as np
Create a dataframe
data = {'name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'], 'year': [2012, 2012, 2013, 2014, 2014], 'reports': [4, 24, 31, 2, 3], 'coverage': [25, 94, 57, 62, 70]} df = pd.DataFrame(data, index = ['Cochice', 'Pima', 'Santa Cruz', 'Maricopa', 'Yuma']) df
coverage | name | reports | year | |
---|---|---|---|---|
Cochice | 25 | Jason | 4 | 2012 |
Pima | 94 | Molly | 24 | 2012 |
Santa Cruz | 57 | Tina | 31 | 2013 |
Maricopa | 62 | Jake | 2 | 2014 |
Yuma | 70 | Amy | 3 | 2014 |
Create a capitalization lambda function
capitalizer = lambda x: x.upper()
Apply the capitalizer function over the column ‘name’
apply() can apply a function along any axis of the dataframe
df['name'].apply(capitalizer)
Cochice JASON
Pima MOLLY
Santa Cruz TINA
Maricopa JAKE
Yuma AMY
Name: name, dtype: object
Map the capitalizer lambda function over each element in the series ‘name’
map() applies an operation over each element of a series
df['name'].map(capitalizer)
Cochice JASON
Pima MOLLY
Santa Cruz TINA
Maricopa JAKE
Yuma AMY
Name: name, dtype: object
Apply a square root function to every single cell in the whole data frame
applymap() applies a function to every single element in the entire dataframe.
# Drop the string variable so that applymap() can run
df = df.drop('name', axis=1) # Return the square root of every cell in the dataframe df.applymap(np.sqrt)
coverage | reports | year | |
---|---|---|---|
Cochice | 5.000000 | 2.000000 | 44.855323 |
Pima | 9.695360 | 4.898979 | 44.855323 |
Santa Cruz | 7.549834 | 5.567764 | 44.866469 |
Maricopa | 7.874008 | 1.414214 | 44.877611 |
Yuma | 8.366600 | 1.732051 | 44.877611 |
Applying A Function Over A Dataframe
Create a function that multiplies all non-strings by 100
# create a function called times100
def times100(x): # that, if x is a string, if type(x) is str: # just returns it untouched return x # but, if not, return it multiplied by 100 elif x: return 100 * x # and leave everything else else: return
Apply the times100 over every cell in the dataframe
df.applymap(times100)
coverage | reports | year | |
---|---|---|---|
Cochice | 2500 | 400 | 201200 |
Pima | 9400 | 2400 | 201200 |
Santa Cruz | 5700 | 3100 | 201300 |
Maricopa | 6200 | 200 | 201400 |
Yuma | 7000 | 300 | 201400 |