1. 程式人生 > >pandas21 讀csv檔案read_csv(12.迭代和塊)(詳細 tcy)

pandas21 讀csv檔案read_csv(12.迭代和塊)(詳細 tcy)

例項-迭代2018/12/26 

# 希望遍歷大檔案而不將整個檔案讀入記憶體指定chunksize逐塊讀取文字檔案
# read_csv或read_table返回值型別是可迭代物件TextFileReader
# 指定iterator=True也將返回TextFileReader物件  
目錄:
第1部分:csv文字檔案讀寫

    pandas 讀csv檔案read_csv(1.文字讀寫概要)https://mp.csdn.net/postedit/85289371
    pandas 讀csv檔案read_csv(2.read_csv引數介紹)https://mp.csdn.net/postedit/85289928
    pandas 讀csv檔案read_csv(3.dtypes指定列資料型別)https://mp.csdn.net/postedit/85290575
    pandas 讀csv檔案read_csv(4.to_csv文字資料寫)https://mp.csdn.net/postedit/85290962
    pandas 讀csv檔案read_csv(5.文字資料讀寫例項)https://mp.csdn.net/postedit/85291123
    pandas 讀csv檔案read_csv(6.命名和使用列)https://mp.csdn.net/postedit/85291430
    pandas 讀csv檔案read_csv(7.索引)https://mp.csdn.net/postedit/85291658
    pandas 讀csv檔案read_csv(8.方言和分隔符)https://mp.csdn.net/postedit/85291994
    pandas 讀csv檔案read_csv(9.浮點轉換和NA值)https://mp.csdn.net/postedit/85292391
    pandas 讀csv檔案read_csv(10.註釋和空行)https://mp.csdn.net/postedit/85292609
    pandas 讀csv檔案read_csv(11.日期時間處理) https://mp.csdn.net/postedit/85292925
    pandas 讀csv檔案read_csv(12.迭代和塊)https://mp.csdn.net/postedit/85293639
    pandas 讀csv檔案read_csv(13.read_fwf讀固定寬度資料)https://mp.csdn.net/postedit/85294010
    
第2部分:
    pandas hdf檔案讀寫簡要https://mp.csdn.net/postedit/85294299
    pandas excel讀寫簡要https://mp.csdn.net/postedit/85294545
    
第3部分:
    python中csv模組用法tcy https://mp.csdn.net/postedit/85228189
    pandas讀csv檔案read_csv錯誤解決辦法7種https://mp.csdn.net/postedit/85228808
    pandas to_string用法https://mp.csdn.net/postedit/85294935

例項1:nrows讀取指定行數

data=' a b c key\n' \
     '0 0 1 2 k1\n' \
     '1 3 4 5 k1\n' \
     '2 6 7 8 k2\n' \
     '3 9 10 11 k3\n' \
     '4 12 13 14 k3\n' \
     '5 15 16 17 k3'

pd.read_csv(StringIO(data), sep='\s+',nrows=2,engine='python')#讀2行資料

  a b c key
0 0 1 2 k1
1 3 4 5 k1  

例項2:- 逐塊讀取檔案chunksize(行數)

chunker = pd.read_csv (StringIO(data), sep='\s+',engine='python', chunksize=2)

for i in chunker:
    print(i)

   a  b  c key
0  0  1  2 k1
1  3  4  5 k1
   a  b  c key
2  6  7  8 k2
3  9 10 11 k3
   a  b  c key
4 12 13 14 k3
5 15 16 17 k3

# 例項2.2:
chunker = pd.read_csv (StringIO(data), sep='\s+',engine='python', chunksize=2)
chunker.get_chunk(3)

  a b c key
0 0 1 2 k1
1 3 4 5 k1
2 6 7 8 k2

chunker.get_chunk(3)

   a  b  c key
3  9 10 11 k3
4 12 13 14 k3
5 15 16 17 k3

chunker.get_chunk(3)#異常停止迭代  
# 例項3:iterator=True迭代檔案 

reader = pd.read_table(StringIO(data), sep='\s+',engine='python', iterator=True)
reader.get_chunk(2)#迭代獲得下2行資料

  a b c key
0 0 1 2 k1
1 3 4 5 k1

for i in reader:
    print(i)

   a  b  c key
2  6  7  8 k2
3  9 10 11 k3
4 12 13 14 k3
5 15 16 17 k3