DataFrame和python中資料結構互相轉換
有時候DataFrame,我們不一定要儲存成檔案、或者入資料庫,而是希望儲存成其它的格式,比如字典、列表、json等等。當然,讀取DataFrame也不一定非要從檔案、或者資料庫,根據現有的資料生成DataFrame也是可以的,那麼該怎麼做呢?我們來看一下
一 . DataFrame轉成python中的資料格式
1 . 轉成json
DataFrame轉成json,可以使用df.to_json()方法
import pandas as pd df = pd.DataFrame({"name": ["mashiro", "satori", "koishi", "nagisa"], "age": [17, 17, 16, 21]}) print(df.to_json()) # {"name":{"0":"mashiro","1":"satori","2":"koishi","3":"nagisa"},"age":{"0":17,"1":17,"2":16,"3":21}}
我們看到雖然轉化成了json,但是有些不完美,那就是它把索引也算進去了
import pandas as pd df = pd.DataFrame({"name": ["mashiro", "satori", "koishi", "nagisa"], "age": [17, 17, 16, 21]}) # 如果不想加索引的話,那麼指定index=False即可 try: print(df.to_json(index=False)) except Exception as e: print(e) # 'index=False' is only valid when 'orient' is 'split' or 'table' # 但是它報錯了,說如果index=False,那麼orient必須指定我split或者table
我們看一下這個orient是什麼
首先orient可以有如下取值:split、records、index、columns、values、table
我們分別演示一下,看看orient取不同的值,結果會有什麼變化
orient='split'
import pandas as pd df = pd.DataFrame({"name": ["mashiro", "satori", "koishi", "nagisa"], "age": [17, 17, 16, 21]}) print(df.to_json(orient="split")) """ { "columns":["name","age"], "index":[0,1,2,3], "data":[["mashiro",17],["satori",17],["koishi",16],["nagisa",21]] } """ print(df.to_json(orient="split", index=False)) """ { "columns":["name","age"], "data":[["mashiro",17],["satori",17],["koishi",16],["nagisa",21]] } """
我們看到會變成三個鍵值對,分別是列名、索引、資料
orient='records'
import pandas as pd df = pd.DataFrame({"name": ["mashiro", "satori", "koishi", "nagisa"], "age": [17, 17, 16, 21]}) print(df.to_json(orient="records")) """ [{"name":"mashiro","age":17}, {"name":"satori","age":17}, {"name":"koishi","age":16}, {"name":"nagisa","age":21}] """
這種格式的資料是比較常用的,相當於列名和每一行資料組合成一個字典,然後存在一個列表裡面。並且我們看到生成json預設跟索引沒啥關係,所以不需要、也不可以加index=False
orient='index'
import pandas as pd df = pd.DataFrame({"name": ["mashiro", "satori", "koishi", "nagisa"], "age": [17, 17, 16, 21]}) print(df.to_json(orient="index")) """ { "0":{"name":"mashiro","age":17}, "1":{"name":"satori","age":17}, "2":{"name":"koishi","age":16}, "3":{"name":"nagisa","age":21} } """
類似於records,只不過這裡把字典作為value放在了外層字典裡,其中key為對應的索引。當然這裡同樣不可以加index=False
orient='columns'
import pandas as pd df = pd.DataFrame({"name": ["mashiro", "satori", "koishi", "nagisa"], "age": [17, 17, 16, 21]}) print(df.to_json(orient="columns")) """ {"name":{"0":"mashiro","1":"satori","2":"koishi","3":"nagisa"},"age":{"0":17,"1":17,"2":16,"3":21}} """
我們看到這個和不指定orient得到結果是一樣的,其實不指定的話orient預設是columns
orient=values
import pandas as pd df = pd.DataFrame({"name": ["mashiro", "satori", "koishi", "nagisa"], "age": [17, 17, 16, 21]}) print(df.to_json(orient="values")) """ [["mashiro",17],["satori",17],["koishi",16],["nagisa",21]] """ # 我們看到當orient指定為values,會只獲取資料 # 另外這個方式類似於to_numpy print(df.to_numpy()) """ [['mashiro' 17] ['satori' 17] ['koishi' 16] ['nagisa' 21]] """ orient=table import pandas as pd df = pd.DataFrame({"name": ["mashiro", "satori", "koishi", "nagisa"], "age": [17, 17, 16, 21]}) # 以資料庫二維表的形式返回 print(df.to_json(orient="table")) """ { "schema": { "fields": [{"name": "index", "type": "integer"}, {"name": "name", "type": "string"}, {"name": "age", "type": "integer"}], "primaryKey": ["index"], "pandas_version": "0.20.0" }, "data": [{"index": 0, "name": "mashiro", "age": 17}, {"index": 1, "name": "satori", "age": 17}, {"index": 2, "name": "koishi", "age": 16}, {"index": 3, "name": "nagisa", "age": 21}] } """ print(df.to_json(orient="table", index=False)) """ { "schema": { "fields": [{"name": "name", "type": "string"}, {"name": "age", "type": "integer"}], "pandas_version": "0.20.0" }, "data": [{"name": "mashiro", "age": 17}, {"name": "satori", "age": 17}, {"name": "koishi", "age": 16}, {"name": "nagisa", "age": 21}] } """
2 . 轉成dict
DataFrame也可以轉成字典,轉換成字典裡面也有一個orient引數,裡面有一部分和to_json是類似的。因為json這個資料結構本身就借鑑了python中的字典,是的你沒有看錯,json這種資料結構參考了python中的字典。
to_dict中的orient可以有如下取值:dict、list、series、split、records、index,預設是dict
orient='dict'
from pprint import pprint import pandas as pd df = pd.DataFrame({"name": ["mashiro", "satori", "koishi", "nagisa"], "age": [17, 17, 16, 21]}) pprint(df.to_dict(orient="dict")) """ {'age': {0: 17, 1: 17, 2: 16, 3: 21}, 'name': {0: 'mashiro', 1: 'satori', 2: 'koishi', 3: 'nagisa'}} """
orient='list'
from pprint import pprint import pandas as pd df = pd.DataFrame({"name": ["mashiro", "satori", "koishi", "nagisa"], "age": [17, 17, 16, 21]}) pprint(df.to_dict(orient="list")) """ {'age': [17, 17, 16, 21], 'name': ['mashiro', 'satori', 'koishi', 'nagisa']} """
orient='series'
from pprint import pprint import pandas as pd df = pd.DataFrame({"name": ["mashiro", "satori", "koishi", "nagisa"], "age": [17, 17, 16, 21]}) # 這種結構真的不常用,就是一個key對應一個series pprint(df.to_dict(orient="series")) """ {'age': 0 17 1 17 2 16 3 21 Name: age, dtype: int64, 'name': 0 mashiro 1 satori 2 koishi 3 nagisa Name: name, dtype: object} """
orient='split'
from pprint import pprint import pandas as pd df = pd.DataFrame({"name": ["mashiro", "satori", "koishi", "nagisa"], "age": [17, 17, 16, 21]}) pprint(df.to_dict(orient="split")) """ {'columns': ['name', 'age'], 'data': [['mashiro', 17], ['satori', 17], ['koishi', 16], ['nagisa', 21]], 'index': [0, 1, 2, 3]} """
orient='records'
from pprint import pprint import pandas as pd df = pd.DataFrame({"name": ["mashiro", "satori", "koishi", "nagisa"], "age": [17, 17, 16, 21]}) pprint(df.to_dict(orient="records")) """ [{'age': 17, 'name': 'mashiro'}, {'age': 17, 'name': 'satori'}, {'age': 16, 'name': 'koishi'}, {'age': 21, 'name': 'nagisa'}] """
orient='index'
from pprint import pprint import pandas as pd df = pd.DataFrame({"name": ["mashiro", "satori", "koishi", "nagisa"], "age": [17, 17, 16, 21]}) pprint(df.to_dict(orient="index")) """ {0: {'age': 17, 'name': 'mashiro'}, 1: {'age': 17, 'name': 'satori'}, 2: {'age': 16, 'name': 'koishi'}, 3: {'age': 21, 'name': 'nagisa'}} """
二 . python中的資料格式轉成DataFrame
1 . 字典轉成DataFrame
import pandas as pd data = {0: {'age': 17, 'name': 'mashiro'}, 1: {'age': 17, 'name': 'satori'}, 2: {'age': 16, 'name': 'koishi'}, 3: {'age': 21, 'name': 'nagisa'}} df = pd.DataFrame.from_dict(data) # 顯然不是我們期待的格式 print(df) """ 0 1 2 3 age 17 17 16 21 name mashiro satori koishi nagisa """ df = pd.DataFrame.from_dict(data, orient="index") print(df) """ age name 0 17 mashiro 1 17 satori 2 16 koishi 3 21 nagisa """
所以df.to_dict和pd.DataFrame.from_json實現的是相反的功能,但是from_dict中的orient引數只有兩種選擇,要麼是index,要麼是columns,預設是columns
from_records
from_records是專門針對外層是列表的資料
import pandas as pd data = [{'age': 17, 'name': 'mashiro'}, {'age': 17, 'name': 'satori'}, {'age': 16, 'name': 'koishi'}, {'age': 21, 'name': 'nagisa'}] df = pd.DataFrame.from_records(data) print(df) """ age name 0 17 mashiro 1 17 satori 2 16 koishi 3 21 nagisa """
其實這種資料就是to_dict(orient="records")生成的