python 正確字串處理(自己踩過的坑)
阿新 • • 發佈:2019-11-30
不管是誰,只要處理過由使用者提交的調查資料,就能明白這種亂七八糟的資料是怎麼一回事。為了得到一組能用於分析工作的格式統一的字串,需要做很多事情:去除空白符、刪除各種標點符號、正確的大寫格式等。做法之一是使用內建的字串方法和正則表示式re模組:
一般寫法
states = [' Alabama ', 'Georgia!', 'Georgia', 'georgia', 'FlOrIda', 'south carolina##', 'West virginia?'] import re def clean_strings(strings): # 一般對資料的處理步驟 result = [] for value in strings: value = value.strip() value = re.sub('[!#?]', '', value) value = value.title() result.append(value) return result In [173]: clean_strings(states) Out[173]: ['Alabama', 'Georgia', 'Georgia', 'Georgia', 'Florida', 'South Carolina', 'West Virginia']
推薦寫法
def remove_punctuation(value):
return re.sub('[!#?]', '', value)
clean_ops = [str.strip, remove_punctuation, str.title] # 函式也是物件
def clean_strings(strings, ops):
result = []
for value in strings:
for function in ops:
value = function(value)
result.append(value)
return result
In [175]: clean_strings(states, clean_ops)
Out[175]:
['Alabama',
'Georgia',
'Georgia',
'Georgia',
'Florida',
'South Carolina',
'West Virginia']
# 或者
In [176]: for x in map(remove_punctuation, states): #
.....: print(x)
Alabama
Georgia
Georgia
georgia
FlOrIda
south carolina
West virginia