請檢視你的Pandas備忘單

阿新 • • 發佈：2019-02-06

引言

Pandas，Numpy和Scikit-Learn是最受歡迎的Python資料科學和分析庫。

Numpy用於較低級別的科學計算。Pandas構建於Numpy之上，專為Python中的實際資料分析而設計。以下是我找的一個由 Kara Tan 大牛提供的一張關於Pandas最常見和最有用的功能的備忘單，我們直接跳吧！

備忘單（讓我們起飛）

匯入資料

任何型別的資料分析都從獲取某些資料開始。Pandas為您提供了很多將資料匯入Python工作簿的選項：

pd.read_csv(filename) # From a CSV file
pd.read_table(filename) # From a delimited text file (like TSV) 

pd.read_excel(filename) # From an Excel file
pd.read_sql(query, connection_object) # Reads from a SQL table/database
pd.read_json(json_string) # Reads from a JSON formatted string, URL or file.
pd.read_html(url) # Parses an html URL, string or file and extracts tables to a list of dataframes
pd.read_clipboard() # Takes the contents of your clipboard and passes it to read_table() 

pd.DataFrame(dict) # From a dict, keys for columns names, values for data as lists

探索資料

將資料匯入Pandas資料幀後，可以使用這些方法來了解資料的外觀：

df.shape() # Prints number of rows and columns in dataframe
df.head(n) # Prints first n rows of the DataFrame
df.tail(n) # Prints last n rows of the DataFrame
df.info() # Index, Datatype and Memory information 

df.describe() # Summary statistics for numerical columns
s.value_counts(dropna=False) # Views unique values and counts
df.apply(pd.Series.value_counts) # Unique values and counts for all columns
df.describe() # Summary statistics for numerical columns
df.mean() # Returns the mean of all columns
df.corr() # Returns the correlation between columns in a DataFrame
df.count() # Returns the number of non-null values in each DataFrame column
df.max() # Returns the highest value in each column
df.min() # Returns the lowest value in each column
df.median() # Returns the median of each column
df.std() # Returns the standard deviation of each column

選擇

通常，您可能需要選擇單個元素或資料的某個子集來檢查它或執行進一步分析。這些方法會派上用場：

df[col] # Returns column with label col as Series
df[[col1, col2]] # Returns Columns as a new DataFrame
s.iloc[0] # Selection by position (selects first element)
s.loc[0] # Selection by index (selects element at index 0)
df.iloc[0,:] # First row
df.iloc[0,0] # First element of first column

資料清理

如果您正在使用真實世界的資料，您可能需要清理它。這些是一些有用的方法：

df.columns = ['a','b','c'] # Renames columns
pd.isnull() # Checks for null Values, Returns Boolean Array
pd.notnull() # Opposite of s.isnull()
df.dropna() # Drops all rows that contain null values
df.dropna(axis=1) # Drops all columns that contain null values
df.dropna(axis=1,thresh=n) # Drops all rows have have less than n non null values
df.fillna(x) # Replaces all null values with x
s.fillna(s.mean()) # Replaces all null values with the mean (mean can be replaced with almost any function from the statistics section)
s.astype(float) # Converts the datatype of the series to float
s.replace(1,'one') # Replaces all values equal to 1 with 'one'
s.replace([1,3],['one','three']) # Replaces all 1 with 'one' and 3 with 'three'
df.rename(columns=lambda x: x + 1) # Mass renaming of columns
df.rename(columns={'old_name': 'new_ name'}) # Selective renaming
df.set_index('column_one') # Changes the index
df.rename(index=lambda x: x + 1) # Mass renaming of index

過濾，排序和分組

過濾，排序和分組資料的方法：

df[df[col] > 0.5] # Rows where the col column is greater than 0.5
df[(df[col] > 0.5) & (df[col] < 0.7)] # Rows where 0.5 < col < 0.7
df.sort_values(col1) # Sorts values by col1 in ascending order
df.sort_values(col2,ascending=False) # Sorts values by col2 in descending order
df.sort_values([col1,col2], ascending=[True,False]) # Sorts values by col1 in ascending order then col2 in descending order
df.groupby(col) # Returns a groupby object for values from one column
df.groupby([col1,col2]) # Returns a groupby object values from multiple columns
df.groupby(col1)[col2].mean() # Returns the mean of the values in col2, grouped by the values in col1 (mean can be replaced with almost any function from the statistics section)
df.pivot_table(index=col1, values= col2,col3], aggfunc=mean) # Creates a pivot table that groups by col1 and calculates the mean of col2 and col3
df.groupby(col1).agg(np.mean) # Finds the average across all columns for every unique column 1 group
df.apply(np.mean) # Applies a function across each column
df.apply(np.max, axis=1) # Applies a function across each row

加入和組合

組合兩個資料幀的方法：

df1.append(df2) # Adds the rows in df1 to the end of df2 (columns should be identical)
pd.concat([df1, df2],axis=1) # Adds the columns in df1 to the end of df2 (rows should be identical)
df1.join(df2,on=col1,how='inner') # SQL-style joins the columns in df1 with the columns on df2 where

寫資料

最後，當您通過分析生成結果時，有幾種方法可以匯出資料：

df.to_csv(filename) # Writes to a CSV file
df.to_excel(filename) # Writes to an Excel file
df.to_sql(table_name, connection_object) # Writes to a SQL table
df.to_json(filename) # Writes to a file in JSON format
df.to_html(filename) # Saves as an HTML table
df.to_clipboard() # Writes to the clipboard

結尾

雖然我學過Pandas，還寫了Pandas學習筆記（5篇），但是到用的時候很容易忘記我需要的功能怎麼實現，怎麼寫程式碼。這張備忘單讓我收益匪淺！

參考文章

請檢視你的Pandas備忘單

引言結尾參考文章引言 Pandas，Numpy和Scikit-Learn是最受歡迎的Python資料科學和分析庫。 Numpy用於較低級別的科學計算。Pandas構建於Numpy之上，專為Python中的實際資料分析而設計。以下是我找的

Android 資源與屬性備忘單

幾天前我偶然發現了我Android開發早期時製作的一份學習備忘單，是為理解如何處理Android資源和主題屬性的語法而整理的。令人意外的是，我發現它對現在的我非常有用，因此決定整理成更適合部落格的形式並分享給大家。知道我們今天要討論的話題了，那麼請看下面，通過xml佈局設定

別鬧了，你還在記密碼？ | 1password 備忘&教程

平臺配置其他 value 一起 word 其中是你方式每個人在網上或電腦離線軟件上都會有一些賬號和密碼。這些賬號，如果你設置成一個呢，不安全，尤其是如果你很多地方的賬號密碼都是同一套的話，如果在一個安全系數比較低的地方被盜號了，那其他地方也瞬間都不安全了。但如

vux 入門備忘大佬請繞道

一、安裝node.js https://nodejs.org/en/ 這樣就可以使用npm嘍二、安裝vux 安裝vux npm install vux --save 安裝vux-loader npm install vux-loader --save-dev 安裝less-loa

轉載大牛所寫內容，MAC 訊息驗證編碼的使用和相關概念非我所寫，我只是備忘檢視，並加註解一些紅色字型內容

資訊保安基礎知識 MAC訊息驗證碼及金鑰管理問題版權宣告：本文為博主原創文章，未經博主允許不得轉載。原文地址https://blog.csdn.net/a359680405/article/details/41518685 &n

## 本篇文章對linux常用的一些命令做一下總結,如有需要補充以及不懂得地方,請在下方留言適合於linux初學者,以及對命令掌握不牢的用來備忘

本篇文章對linux常用的一些命令做一下總結,如有需要補充以及不懂得地方,請在下方留言適合於linux初學者,以及對命令掌握不牢的用來備忘一,磁碟管理1.顯示當前目錄位置 pwd2.切換目錄 cd 目錄名3.列出當前目錄下目錄和檔案詳細內容ll只顯示檔名ls二,檔案管理1.建立資料夾 mkdir 資料夾名2

【備忘】2018年最新node.js+ES+Koa2手把手教你開發一個短視訊網站視訊教程

1） 2018年前端程式設計趨勢2） ES6與ES7 語法特性與編寫規範3） Koa 框架 API深入講解4） Koa1 、Koa2、 Express 三大框架特點5）開始搭建一個短片網站6）利用爬蟲爬取網站所需基礎資料7）深度學習 Node.js 非同步 IO8）

讓你的文件能在諾基亞“當前備忘 (ActiveNotes)”中開啟編輯

很喜歡諾基亞手機裡的“當前備忘”這個自帶的軟體，一直希望能夠用它開啟編輯自己的文件。早些時候，曾嘗試做過一些努力，發現自己生成的HTML格式文件在裡面一開啟就提示格式不支援，然後文件就被刪除了。一直以為是自己沒弄清裡邊的an_nuid data和an_duid data標

Qt圖元，場景，檢視，狀態機，狀態轉移，動畫設定等的示列（備忘）

轉載註明出處 http://blog.csdn.net/sprintfwater/article/details/8734991 SpringWater（GHQ） //定義場景 QGraphicsScene scene(-350, -350, 700, 700); //

[備忘]js表單序列化代碼

box submit amp element serialize res define cti style function serialize(form) { var parts = [], elems = form.element

實用收藏Linux命令備忘

屏幕 ssh 狀態標準輸出系統 play mkdir ger rdquo 系統操作 #使用shutdown命令馬上重啟系統[[email protected]/* */ ~]# shutdown –r now #使用shutdown命令馬上

cpan安裝perl module的方法和步驟（備忘帖）

roo for lora pre permanent help base -i rmi 適用場景：不具備root權限且沒有sudo權限的普通用戶安裝perl module安裝步驟：1)刪除/.cpan/.lockrm -rf /home/users/.cpan/.lock2

linux備忘

blog mage 技術分享 img src http image alt logs linux備忘

Python備忘

class 安裝 ont 備忘 org 開源 ron 自己的 color Python 庫索引中包含了大量開源的庫，你可以在你自己的程序中使用它們。要想了解如何安裝並使用這些庫，你可以使用 pip。Python備忘

ajax基礎------備忘

user odi blog www action writer word nal urlencode 1:register.jsp <%@ page language="java" contentType="text/html; charset=UTF-8"

[2017.5.29]備忘買書挑戰程序設計競賽

nio com html 備忘 %d gda .html amp click http://product.dangdang.com/23272528.html?_utm_sem_id=231367&_ddclickunion=422-kw-4-%CD%BC%CA%

nginx發布靜態目錄備忘

centos nginx 靜態目錄發布靜態文件前陣子配一個靜態目錄，想當然的覺得相當簡單，不就是寫個目錄嗎。配完以後怎麽都不行，一直出404。找了一些文章，看來看去，我的配置是對的呀，這實在奇怪。今天中午吃飯時候突然想到，可能是因為目錄多寫了一級，飯後一試，果然。原先配置文件裏是這樣寫的：server /

awk備忘

.... 其中改變 tor 程序設計語言次數 state 對數 http awk是一個強大的文本分析工具，相對於grep的查找，sed的編輯，awk在其對數據分析並生成報告時，顯得尤為強大。簡單來說awk就是把文件逐行的讀入，以空格為默認分隔符將每行切片，切開的部分再進

inode備忘

自動生成 nbsp 12g 隱藏日期 device 源文件 100% 文件名 -> inode -> device block 轉自：http://www.cnblogs.com/itech/archive/2012/05/15/2502284.html

Laravel之備忘項(不定期更新)

自定義字段 ida 不定 blog red color request validate 打印sql 1.自定義字段驗證錯誤信息 $this->validate($request, [‘name‘ => ‘required|max:5

請檢視你的Pandas備忘單

引言

備忘單（讓我們起飛）

匯入資料

探索資料

選擇

資料清理

過濾，排序和分組

加入和組合

寫資料

結尾

參考文章

相關推薦