運用pandas將字典的列表轉化為獨立的資料列
阿新 • • 發佈:2019-02-03
在Stack Overflow看到的一個帖子table0.csv資料集如下:
但現在我想處理資料得到如下的table:
name | status | number | message |
matt | active | 12345 | [job: , money: none, wife: none] |
james | active | 23456 | [group: band, wife: yes, money: 10000] |
adam | inactive | 34567 | [job: none, money: none, wife: , kids: one, group: jail] |
方法一:
首先通過replace(\s+代表一個及以上空格),將list of dict轉化為set of dict 然後使用ast
import ast
df.message = df.message.replace([':\s+,','\[', '\]', ':\s+', ',\s+'], ['":"none","', '{"', '"}', '":"', '","'], regex=True)
df.message = df.message.apply(ast.literal_eval)
df1 = pd.DataFrame(df.pop('message').values.tolist(), index=df.index)
print (df1)
kids money group job money wife 0 NaN none NaN none NaN none 1 NaN NaN band NaN 10000 yes 2 one NaN jail none none none
問題來了因為‘money’在第二行的message中是第三個dict,不同於其他兩行在第二個dict,
因此會產生兩列‘money’。這時候需要我們手動修改,不展開了。
所以按正常的操作得到如下:
df=pd.concat([df,df1],axis=1)
print(df)
name status number kids money group job money wife 0 matt active 12345 NaN none NaN none NaN none 1 james active 23456 NaN NaN band NaN 10000 yes 2 adam inactive 34567 one NaN jail none none none
方法二:
使用yaml包
import yaml
df.message = df.message.replace(['\[','\]'],['{','}'], regex=True).apply(yaml.load)
df1 = pd.DataFrame(df.pop('message').values.tolist(), index=df.index)
print (df1)
group job kids money wife 0 NaN None NaN none none 1 band NaN NaN 10000 True 2 jail none one none Nonedf = pd.concat([df, df1], axis=1)
print (df)
name status number group job kids money wife 0 matt active 12345 NaN None NaN none none 1 james active 23456 band NaN NaN 10000 True 2 adam inactive 34567 jail none one none Non源地址:https://stackoverflow.com/questions/43032182/pandas-list-of-dictionary-to-separate-columns