Pandas DataFrame 數據選取和過濾

阿新 • • 發佈：2018-10-31

lam read 1.4 大於 -c stack ati title 1.2

This would allow chaining operations like:

pd.read_csv(‘imdb.txt‘)
  .sort(columns=‘year‘)
  .filter(lambda x: x[‘year‘]>1990)   # <---this is missing in Pandas
  .to_csv(‘filtered.csv‘)

For current alternatives see:

http://stackoverflow.com/questions/11869910/pandas-filter-rows-of-dataframe-with-operator-chaining

可以這樣：

df = pd.read_csv(‘imdb.txt‘).sort(columns=‘year‘)
df[df[‘year‘]>1990].to_csv(‘filtered.csv‘)

# however, could potentially do something like this:

pd.read_csv(‘imdb.txt‘)
  .sort(columns=‘year‘)
  .[lambda x: x[‘year‘]>1990]
  .to_csv(‘filtered.csv‘)
or

pd.read_csv(‘imdb.txt‘)
  .sort(columns=‘year‘)
  .loc[lambda x: x[‘year‘]>1990]
  .to_csv(‘filtered.csv‘)

from:https://yangjin795.github.io/pandas_df_selection.html

Pandas 是 Python Data Analysis Library, 是基於 numpy 庫的一個為了數據分析而設計的一個 Python 庫。它提供了很多工具和方法，使得使用 python 操作大量的數據變得高效而方便。

本文專門介紹 Pandas 中對 DataFrame 的一些對數據進行過濾、選取的方法和工具。首先，本文所用的原始數據如下：

df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list(‘ABCD‘))

    Out[9]: 
                     A         B         C         D
    2017-04-01  0.522241  0.495106 -0.268194 -0.035003
    2017-04-02  2.104572 -0.977768 -0.139632 -0.735926
    2017-04-03  0.480507  1.215048  1.313314 -0.072320
    2017-04-04  1.700309  0.287588 -0.012103  0.525291
    2017-04-05  0.526615 -0.417645  0.405853 -0.835213
    2017-04-06  1.143858 -0.326720  1.425379  0.531037

選取

通過 [] 來選取

選取一列或者幾列：

df[‘A‘]
Out:
    2017-04-01    0.522241
    2017-04-02    2.104572
    2017-04-03    0.480507
    2017-04-04    1.700309
    2017-04-05    0.526615
    2017-04-06    1.143858

df[[‘A‘,‘B‘]]
Out:
                       A         B
    2017-04-01  0.522241  0.495106
    2017-04-02  2.104572 -0.977768
    2017-04-03  0.480507  1.215048
    2017-04-04  1.700309  0.287588
    2017-04-05  0.526615 -0.417645
    2017-04-06  1.143858 -0.326720

選取某一行或者幾行：

df[‘2017-04-01‘:‘2017-04-01‘]
Out:
                       A         B         C         D
    2017-04-01  0.522241  0.495106 -0.268194 -0.03500

df[‘2017-04-01‘:‘2017-04-03‘]
                       A         B         C         D
    2017-04-01  0.522241  0.495106 -0.268194 -0.035003
    2017-04-02  2.104572 -0.977768 -0.139632 -0.735926
    2017-04-03  0.480507  1.215048  1.313314 -0.072320

loc, 通過行標簽選取數據

df.loc[‘2017-04-01‘,‘A‘]

df.loc[‘2017-04-01‘]
Out:
    A    0.522241
    B    0.495106
    C   -0.268194
    D   -0.035003

df.loc[‘2017-04-01‘:‘2017-04-03‘]
Out:
                       A         B         C         D
    2017-04-01  0.522241  0.495106 -0.268194 -0.035003
    2017-04-02  2.104572 -0.977768 -0.139632 -0.735926
    2017-04-03  0.480507  1.215048  1.313314 -0.072320

df.loc[‘2017-04-01‘:‘2017-04-04‘,[‘A‘,‘B‘]]
Out:
                       A         B
    2017-04-01  0.522241  0.495106
    2017-04-02  2.104572 -0.977768
    2017-04-03  0.480507  1.215048
    2017-04-04  1.700309  0.287588

df.loc[:,[‘A‘,‘B‘]]
Out:
                       A         B
    2017-04-01  0.522241  0.495106
    2017-04-02  2.104572 -0.977768
    2017-04-03  0.480507  1.215048
    2017-04-04  1.700309  0.287588
    2017-04-05  0.526615 -0.417645
    2017-04-06  1.143858 -0.326720

iloc, 通過行號獲取數據

df.iloc[2]
Out:
    A    0.480507
    B    1.215048
    C    1.313314
    D   -0.072320

df.iloc[1:3]
Out:
                       A         B         C         D
    2017-04-02  2.104572 -0.977768 -0.139632 -0.735926
    2017-04-03  0.480507  1.215048  1.313314 -0.072320

df.iloc[1,1]

df.iloc[1:3,1]

df.iloc[1:3,1:2]

df.iloc[[1,3],[2,3]]
Out:
                       C         D
    2017-04-02 -0.139632 -0.735926
    2017-04-04 -0.012103  0.525291

df.iloc[[1,3],:]

df.iloc[:,[2,3]]

iat, 獲取某一個 cell 的值

df.iat[1,2]
Out:
    -0.13963224781812655

過濾

使用 [] 過濾

[]中是一個boolean 表達式，凡是計算為 True 的行就會被選取。

df[df.A>1]
Out:
                       A         B         C         D
    2017-04-02  2.104572 -0.977768 -0.139632 -0.735926
    2017-04-04  1.700309  0.287588 -0.012103  0.525291
    2017-04-06  1.143858 -0.326720  1.425379  0.531037

df[df>1]
Out:
                       A         B         C   D
    2017-04-01       NaN       NaN       NaN NaN
    2017-04-02  2.104572       NaN       NaN NaN
    2017-04-03       NaN  1.215048  1.313314 NaN
    2017-04-04  1.700309       NaN       NaN NaN
    2017-04-05       NaN       NaN       NaN NaN
    2017-04-06  1.143858       NaN  1.425379 NaN

df[df.A+df.B>1.5]
Out:
                       A         B         C         D      
    2017-04-03  0.480507  1.215048  1.313314 -0.072320  
    2017-04-04  1.700309  0.287588 -0.012103  0.525291

下面是一個更加復雜的例子，選取的是 index 在 ‘2017-04-01‘中‘2017-04-04‘的，一行的數據的和大於1的行：

df.loc[‘2017-04-01‘:‘2017-04-04‘,df.sum()>1]

還可以通過和 apply 方法結合，構造更加復雜的過濾，實現將某個返回值為 boolean 的方法作為過濾條件：

df[df.apply(lambda x: x[‘b‘] > x[‘c‘], axis=1)]

使用 isin

df[‘E‘]=[‘one‘, ‘one‘,‘two‘,‘three‘,‘four‘,‘three‘]
                       A         B         C         D      E
    2017-04-01  0.522241  0.495106 -0.268194 -0.035003    one
    2017-04-02  2.104572 -0.977768 -0.139632 -0.735926    one
    2017-04-03  0.480507  1.215048  1.313314 -0.072320    two
    2017-04-04  1.700309  0.287588 -0.012103  0.525291  three
    2017-04-05  0.526615 -0.417645  0.405853 -0.835213   four
    2017-04-06  1.143858 -0.326720  1.425379  0.531037  three

df[df.E.isin([‘one‘])]
    Out:
                       A         B         C         D    E
    2017-04-01  0.522241  0.495106 -0.268194 -0.035003  one
    2017-04-02  2.104572 -0.977768 -0.139632 -0.735926  one

Pandas DataFrame 數據選取和過濾

lam read 1.4 大於 -c stack ati title 1.2 This would allow chaining operations like: pd.read_csv(‘imdb.txt‘) .sort(columns=‘year‘) .fil

Pandas DataFrame 資料選取和過濾

This would allow chaining operations like: pd.read_csv('imdb.txt') .sort(columns='year') .filter(lambda x: x['year']>1990) # <---this is missin

Pandas：DataFrame數據的更改、插入新增的列和行

core 參數 tro 語法 columns ont 對象需要 mage 一、更改DataFrame的某些值 1、更改DataFrame中的數據，原理是將這部分數據提取出來，重新賦值為新的數據。 2、需要註意的是，數據更改直接針對DataFrame原數據更改，操作無法

Pandas中DataFrame數據合並、連接（concat、merge、join）之concat

多個 name mage 參數技術 key 數據合並 bubuko axis 一、concat：沿著一條軸，將多個對象堆疊到一起 concat(objs, axis=0, join=‘outer‘, join_axes=None, ignore_index=False,

基於 Python 和 Pandas 的數據分析(1)

下載 and 繼續編輯器 ade start 比較顯示 -s 基於 Python 和 Pandas 的數據分析(1) Pandas 是 Python 的一個模塊(module), 我們將用 Python 完成接下來的數據分析的學習. Pandas 模塊是一個高性能，高效

基於 Python 和 Pandas 的數據分析(3) --- 輸入/輸出基礎

als 作圖輸入 UNC 改變同時 inf 有一點理論這一節, 我們要討論 Pandas 的輸入與輸出, 並且應用在現實的實際例子中. 為了得到大量的數據, 向大家推薦一個網站 Quandl. Quandl 有很多免費和付費的資源. 這個網站最大的優勢在於數據的規範

基於 Python 和 Pandas 的數據分析(7) --- Pickling

2.7 個數 specified -s cti int ins 可讀的 lac 上一節我們介紹了幾種合並數據的方法. 這一節, 我們將重新開始不動產的例子. 在第四節中我們寫了如下代碼: import Quandl import pandas as pd fiddy_st

Pandas的DataFrame數據類型

series size inf imp spa http 數據類型數據 array對象縱軸表示不同索引axis=0,橫軸表示不同列axis=1 DataFrame類型創建 1.從二維ndarray對象創建 1 import pa

pandas 獲取數據幀DataFrame的行、列數

das pre row 技術分享 mage object 獲取數據 inf shape 1、創建數據幀 import pandas as pd df = pd.DataFrame([[1, ‘A‘, ‘3%‘ ], [2, ‘B‘]], index=[‘row_0‘

數據結構和算法學習

指定位置 -1 img com 優缺點數據機構分享學習一、線性表的順序機構：　　插入某個元素到指定位置，如下：　　刪除某個位置的元素，操作：優缺點：　　二、線性表的鏈式結構：

數據庫設計之數據庫，數據表和字段等的命名總結

數據庫設計英文單詞數據表下劃線命名數據庫命名規則：根據項目的實際意思來命名。數據表命名規則：1.數據表的命名大部分都是以名詞的復數形式並且都為小寫；2.盡量使用前綴"table_"；3.如果數據表的表名是由多個單詞組成，則盡量用下劃線連接起來；但是不要超過30個字符，一旦超過3

表中的數據備份和恢復

delete -- lac values reat varchar let color into -- 建表 create table emp( sid int(8) primary key, sname varchar(10), sex varchar(2), chu

關於客戶端設計之數據分類和存儲的思考

service his defaults def sqli href 思想 number fault 一、關於數據的分類在Android 客戶端設計過程中，我將數據分為未知，已知（本地），臨時，三者之間根據需求相互轉化。未知主要來自用戶輸入和服務端輸入。已知主

關系數據庫和NOSQL比較

2個二級需求主鍵比較無法需要 strong ron 關系數據庫 NOSQL 功能： NOSQL 功能簡單基本只支持主鍵查詢，有的NOSQL支持非主鍵查詢(不過非主鍵查詢時，其性能也很慢)，很少有NOSQL支

數據結構和算法

數據 .com b+ wid 進行答案 -1 bsp 比較 1.二叉排序樹二叉排序樹又稱二叉查找樹，二叉排序樹或者一顆空樹，或者是具有如下性質的二叉樹：（1）若它的左子樹非空，則左子樹上所有節點的值均小於根節點（2）若它的右子樹非空，

php的數據訪問和封裝運用

ble www 定義 include w3c 如果 str ctype var_dump php數據訪問： <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3

上傳文件到數據庫和服務器

上傳文件上傳文件我們需要做三步。①在頁面中能選擇文件上傳；②能夠將二進制數據轉為byte數組，然後存入數據庫中，註意數據庫字段的類型；③將文件保存到服務器。 @RequestMapping(value = "/upload") public void upload(UpLoadFile uploadFile

數據結構和算法-一元多項式運算算法(加法）

stdlib.h ted 技術分享系統名稱 scanf 設置小數表示算法名稱：一元多項式算法算法介紹：加法運算：將具有與相同冪項的系數相加即可得到合並後的多項式。若某個冪項只存在於一個多項式中，則直接合並到結果中舉例利用代碼實現這裏主要

Android 打造隨意層級樹形控件考驗你的數據結構和設計

getparent layout lin throw draw set code 完整三角形轉載請標明出處：http://blog.csdn.net/lmj623565791/article/details/40212367，本文出自：【張鴻洋的博客】1、概述大家在項

第7講++創建數據表和約束

ref gin mar reat 數據外鍵唯一約束 log weight 二、創建數據表 1.創建簡單的數據表 --命令格式 --create table 表名 -- (列定義列約束 [,……n]) --實例1：在xscj庫中,創

Pandas DataFrame 數據選取和過濾

from:https://yangjin795.github.io/pandas_df_selection.html

選取

通過 [] 來選取

選取一列或者幾列：

選取某一行或者幾行：

loc, 通過行標簽選取數據

iloc, 通過行號獲取數據

iat, 獲取某一個 cell 的值

過濾

使用 [] 過濾

使用 isin

相關推薦