Python pandas快速入門

來自官網十分鐘教學
Pandas的主要資料結構：

Dimensions	Name	Description
1	Series	1D labeled homogeneously-typed array
2	DataFrame	General 2D labeled, size-mutable tabular structure with potentially heterogeneously-typed columns
3	Panel	General 3D labeled, also size-mutable array

一、引入

import pandas as pd   //資料分析，程式碼基於numpy
import numpy as np    //處理資料，程式碼基於ndarray
import matplotlib.pyplot as plt      //畫圖

二、建立物件

Series字典物件

>>>s = pd.Series([1,3,5,np.nan,6,8])   //預設以數字從0開始作為鍵值,使用np.nan表示不參與計算
>>>s
0    1.0
1    3.0
2    5.0
3    NaN
4    6.0
5    8.0
dtype: float64

>>> s = pd.Series(data=[1,2,3,4],index = ['a','b','c','d'])  //傳入鍵和值方式
>>> s
a    1
b    2
c    3
d    4
dtype: int64
>>> s.index    //獲取鍵列表
Index(['a', 'b', 'c', 'd'], dtype='object')
>>> s.values    //獲取值列表
array([1, 2, 3, 4], dtype=int64)

DataFrame表格物件

In [10]: df2 = pd.DataFrame({ 'A' 
 : 1.,
                     'B' : pd.Timestamp('20130102'),
                     'C' : pd.Series(1,index=list(range(4)),dtype='float32'),   //生成Series物件,取的是value
                     'D' : np.array([3] * 4,dtype='int32'),  //生成numpy物件
                     'E' : pd.Categorical(["test","train","test","train"]),
                     'F' : 'foo' })  


In [11]: df2
Out[11]:          // 預設以數字從0開始作為行鍵,以字典鍵為列鍵
     A          B    C  D      E    F
0  1.0 2013-01-02  1.0  3   test  foo
1  1.0 2013-01-02  1.0  3  train  foo
2  1.0 2013-01-02  1.0  3   test  foo
3  1.0 2013-01-02  1.0  3  train  foo

In [6]: dates = pd.date_range('20130101', periods=6)

In [7]: dates
Out[7]: 
DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
               '2013-01-05', '2013-01-06'],
              dtype='datetime64[ns]', freq='D')

In [8]: df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD'))    //np.random.randn(6,4)返回一個樣本，具有標準正態分佈

In [9]: df
Out[9]:          // 指定dates為行鍵，columns為列鍵
                   A         B         C         D
2013-01-01  0.469112 -0.282863 -1.509059 -1.135632
2013-01-02  1.212112 -0.173215  0.119209 -1.044236
2013-01-03 -0.861849 -2.104569 -0.494929  1.071804
2013-01-04  0.721555 -0.706771 -1.039575  0.271860
2013-01-05 -0.424972  0.567020  0.276232 -1.087401
2013-01-06 -0.673690  0.113648 -1.478427  0.524988


In [12]: df2.dtypes    //檢視列資料型別
Out[12]: 
A           float64
B    datetime64[ns]
C           float32
D             int32
E          category
F            object
dtype: object

三、檢視資料

檢視頭尾資料：

In [14]: df.head()    //預設值5
Out[14]: 
                   A         B         C         D
2013-01-01  0.469112 -0.282863 -1.509059 -1.135632
2013-01-02  1.212112 -0.173215  0.119209 -1.044236
2013-01-03 -0.861849 -2.104569 -0.494929  1.071804
2013-01-04  0.721555 -0.706771 -1.039575  0.271860
2013-01-05 -0.424972  0.567020  0.276232 -1.087401

In [15]: df.tail(3)     //預設值5
Out[15]:  
                   A         B         C         D
2013-01-04  0.721555 -0.706771 -1.039575  0.271860
2013-01-05 -0.424972  0.567020  0.276232 -1.087401
2013-01-06 -0.673690  0.113648 -1.478427  0.524988

檢視行鍵、列鍵、資料：

In [16]: df.index
Out[16]: 
DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
               '2013-01-05', '2013-01-06'],
              dtype='datetime64[ns]', freq='D')

In [17]: df.columns
Out[17]: Index([u'A', u'B', u'C', u'D'], dtype='object')

In [18]: df.values
Out[18]: 
array([[ 0.4691, -0.2829, -1.5091, -1.1356],
       [ 1.2121, -0.1732,  0.1192, -1.0442],
       [-0.8618, -2.1046, -0.4949,  1.0718],
       [ 0.7216, -0.7068, -1.0396,  0.2719],
       [-0.425 ,  0.567 ,  0.2762, -1.0874],
       [-0.6737,  0.1136, -1.4784,  0.525 ]])

檢視資料整體概況，和、平均值、最大、最小等：

In [19]: df.describe()
Out[19]: 
              A         B         C         D
count  6.000000  6.000000  6.000000  6.000000
mean   0.073711 -0.431125 -0.687758 -0.233103
std    0.843157  0.922818  0.779887  0.973118
min   -0.861849 -2.104569 -1.509059 -1.135632
25%   -0.611510 -0.600794 -1.368714 -1.076610
50%    0.022070 -0.228039 -0.767252 -0.386188
75%    0.658444  0.041933 -0.034326  0.461706
max    1.212112  0.567020  0.276232  1.071804

train_df.info()
print('_'*40)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
PassengerId    891 non-null int64
Survived       891 non-null int64
Pclass         891 non-null int64
Name           891 non-null object
Sex            891 non-null object
Age            714 non-null float64
SibSp          891 non-null int64
Parch          891 non-null int64
Ticket         891 non-null object
Fare           891 non-null float64
Cabin          204 non-null object
Embarked       889 non-null object
dtypes: float64(2), int64(5), object(5)
memory usage: 83.6+ KB
________________________________________

train_df.describe(include=['O'])

Name    Sex Ticket  Cabin   Embarked
count   891 891 891 204 889
unique  891 2   681 147 3
top Chronopoulos, Mr. Apostolos male    CA. 2343    G6  S
freq    1   577 7   4   644

行或列平均值：

In [61]: df.mean()
Out[61]: 
A   -0.004474
B   -0.383981
C   -0.687758
D    5.000000
F    3.000000
dtype: float64

In [62]: df.mean(1)
Out[62]: 
2013-01-01    0.872735
2013-01-02    1.431621
2013-01-03    0.707731
2013-01-04    1.395042
2013-01-05    1.883656
2013-01-06    1.592306
Freq: D, dtype: float64

轉置：

In [20]: df.T
Out[20]: 
   2013-01-01  2013-01-02  2013-01-03  2013-01-04  2013-01-05  2013-01-06
A    0.469112    1.212112   -0.861849    0.721555   -0.424972   -0.673690
B   -0.282863   -0.173215   -2.104569   -0.706771    0.567020    0.113648
C   -1.509059    0.119209   -0.494929   -1.039575    0.276232   -1.478427
D   -1.135632   -1.044236    1.071804    0.271860   -1.087401    0.524988

根據行、列排序：

In [21]: df.sort_index(axis=1, ascending=False)    //根據軸，可以.sort_index(axis=0, by=None, ascending=True)。by引數只能對列
Out[21]: 
                   D         C         B         A
2013-01-01 -1.135632 -1.509059 -0.282863  0.469112
2013-01-02 -1.044236  0.119209 -0.173215  1.212112
2013-01-03  1.071804 -0.494929 -2.104569 -0.861849
2013-01-04  0.271860 -1.039575 -0.706771  0.721555
2013-01-05 -1.087401  0.276232  0.567020 -0.424972
2013-01-06  0.524988 -1.478427  0.113648 -0.673690
Sorting by values

In [22]: df.sort_values(by='B')       //根據值
Out[22]: 
                   A         B         C         D
2013-01-03 -0.861849 -2.104569 -0.494929  1.071804
2013-01-04  0.721555 -0.706771 -1.039575  0.271860
2013-01-01  0.469112 -0.282863 -1.509059 -1.135632
2013-01-02  1.212112 -0.173215  0.119209 -1.044236
2013-01-06 -0.673690  0.113648 -1.478427  0.524988
2013-01-05 -0.424972  0.567020  0.276232 -1.087401

四、選擇資料

選擇單列：

In [23]: df['A']  //可使用df.A
Out[23]: 
2013-01-01    0.469112
2013-01-02    1.212112
2013-01-03   -0.861849
2013-01-04    0.721555
2013-01-05   -0.424972
2013-01-06   -0.673690
Freq: D, Name: A, dtype: float64

選擇區域性：

In [24]: df[0:3]
Out[24]: 
                   A         B         C         D
2013-01-01  0.469112 -0.282863 -1.509059 -1.135632
2013-01-02  1.212112 -0.173215  0.119209 -1.044236
2013-01-03 -0.861849 -2.104569 -0.494929  1.071804

In [25]: df['20130102':'20130104']
Out[25]: 
                   A         B         C         D
2013-01-02  1.212112 -0.173215  0.119209 -1.044236
2013-01-03 -0.861849 -2.104569 -0.494929  1.071804
2013-01-04  0.721555 -0.706771 -1.039575  0.271860

標籤選擇：
通過行鍵，列鍵

In [26]: df.loc[dates[0]]        //選擇一行，會降維
Out[26]: 
A    0.469112
B   -0.282863
C   -1.509059
D   -1.135632
Name: 2013-01-01 00:00:00, dtype: float64

In [27]: df.loc[:,['A','B']]  //區域性選擇
Out[27]: 
                   A         B
2013-01-01  0.469112 -0.282863
2013-01-02  1.212112 -0.173215
2013-01-03 -0.861849 -2.104569
2013-01-04  0.721555 -0.706771
2013-01-05 -0.424972  0.567020
2013-01-06 -0.673690  0.113648

In [28]: df.loc['20130102':'20130104',['A','B']]    //區域性選擇
Out[28]: 
                   A         B
2013-01-02  1.212112 -0.173215
2013-01-03 -0.861849 -2.104569
2013-01-04  0.721555 -0.706771

In [29]: df.loc['20130102',['A','B']]   //選擇一行，會降維
Out[29]: 
A    1.212112
B   -0.173215
Name: 2013-01-02 00:00:00, dtype: float64

In [30]: df.loc[dates[0],'A']   //選擇具體某個元素，會降維
Out[30]: 0.46911229990718628

In [31]: df.at[dates[0],'A']     //選擇具體某個元素，會降維
Out[31]: 0.46911229990718628

位置選擇：
存在一個從0開始類似於陣列

In [32]: df.iloc[3]
Out[32]: 
A    0.721555
B   -0.706771
C   -1.039575
D    0.271860
Name: 2013-01-04 00:00:00, dtype: float64

In [33]: df.iloc[3:5,0:2]
Out[33]: 
                   A         B
2013-01-04  0.721555 -0.706771
2013-01-05 -0.424972  0.567020

In [34]: df.iloc[[1,2,4],[0,2]]
Out[34]: 
                   A         C
2013-01-02  1.212112  0.119209
2013-01-03 -0.861849 -0.494929
2013-01-05 -0.424972  0.276232

In [35]: df.iloc[1:3,:]
Out[35]: 
                   A         B         C         D
2013-01-02  1.212112 -0.173215  0.119209 -1.044236
2013-01-03 -0.861849 -2.104569 -0.494929  1.071804

In [36]: df.iloc[:,1:3]
Out[36]: 
                   B         C
2013-01-01 -0.282863 -1.509059
2013-01-02 -0.173215  0.119209
2013-01-03 -2.104569 -0.494929
2013-01-04 -0.706771 -1.039575
2013-01-05  0.567020  0.276232
2013-01-06  0.113648 -1.478427

In [37]: df.iloc[1,1]
Out[37]: -0.17321464905330858

In [38]: df.iat[1,1]
Out[38]: -0.17321464905330858

布林索引：

In [39]: df[df.A > 0]
Out[39]: 
                   A         B         C         D
2013-01-01  0.469112 -0.282863 -1.509059 -1.135632
2013-01-02  1.212112 -0.173215  0.119209 -1.044236
2013-01-04  0.721555 -0.706771 -1.039575  0.271860

In [40]: df[df > 0]
Out[40]: 
                   A         B         C         D
2013-01-01  0.469112       NaN       NaN       NaN
2013-01-02  1.212112       NaN  0.119209       NaN
2013-01-03       NaN       NaN       NaN  1.071804
2013-01-04  0.721555       NaN       NaN  0.271860
2013-01-05       NaN  0.567020  0.276232       NaN
2013-01-06       NaN  0.113648       NaN  0.524988

In [41]: df2 = df.copy()

In [42]: df2['E'] = ['one', 'one','two','three','four','three']

In [43]: df2
Out[43]: 
                   A         B         C         D      E
2013-01-01  0.469112 -0.282863 -1.509059 -1.135632    one
2013-01-02  1.212112 -0.173215  0.119209 -1.044236    one
2013-01-03 -0.861849 -2.104569 -0.494929  1.071804    two
2013-01-04  0.721555 -0.706771 -1.039575  0.271860  three
2013-01-05 -0.424972  0.567020  0.276232 -1.087401   four
2013-01-06 -0.673690  0.113648 -1.478427  0.524988  three

In [44]: df2[df2['E'].isin(['two','four'])]
Out[44]: 
                   A         B         C         D     E
2013-01-03 -0.861849 -2.104569 -0.494929  1.071804   two
2013-01-05 -0.424972  0.567020  0.276232 -1.087401  four

五、修改資料

讀取時將多列併成一列：

def parse(x):
    return datetime.strptime(x, '%Y %m %d %H')
dataset = read_csv('raw.csv',  parse_dates = [['year', 'month', 'day', 'hour']], index_col=0, date_parser=parse)

Series賦值列：

In [45]: s1 = pd.Series([1,2,3,4,5,6], index=pd.date_range('20130102', periods=6))

In [46]: s1
Out[46]: 
2013-01-02    1
2013-01-03    2
2013-01-04    3
2013-01-05    4
2013-01-06    5
2013-01-07    6
Freq: D, dtype: int64

In [47]: df['F'] = s1     //通過Series賦值列

賦值單個元素：

df.at[dates[0],'A'] = 0
df.iat[0,1] = 0

df.loc[:,'D'] = np.array([5] * len(df))   //通過numpy賦值列
In [51]: df
Out[51]: 
                   A         B         C  D    F
2013-01-01  0.000000  0.000000 -1.509059  5  NaN
2013-01-02  1.212112 -0.173215  0.119209  5  1.0
2013-01-03 -0.861849 -2.104569 -0.494929  5  2.0
2013-01-04  0.721555 -0.706771 -1.039575  5  3.0
2013-01-05 -0.424972  0.567020  0.276232  5  4.0
2013-01-06 -0.673690  0.113648 -1.478427  5  5.0

In [52]: df2 = df.copy()

In [53]: df2[df2 > 0] = -df2    //為每個資料賦值

In [54]: df2
Out[54]: 
                   A         B         C  D    F
2013-01-01  0.000000  0.000000 -1.509059 -5  NaN
2013-01-02 -1.212112 -0.173215 -0.119209 -5 -1.0
2013-01-03 -0.861849 -2.104569 -0.494929 -5 -2.0
2013-01-04 -0.721555 -0.706771 -1.039575 -5 -3.0
2013-01-05 -0.424972 -0.567020 -0.276232 -5 -4.0
2013-01-06 -0.673690 -0.113648 -1.478427 -5 -5.0

修改索引：

In [55]: df1 = df.reindex(index=dates[0:4], columns=list(df.columns) + ['E'])   //修改DataFrame的鍵

In [56]: df1.loc[dates[0]:dates[1],'E'] = 1

In [57]: df1
Out[57]: 
                   A         B         C  D    F    E
2013-01-01  0.000000  0.000000 -1.509059  5  NaN  1.0
2013-01-02  1.212112 -0.173215  0.119209  5  1.0  1.0
2013-01-03 -0.861849 -2.104569 -0.494929  5  2.0  NaN
2013-01-04  0.721555 -0.706771 -1.039575  5  3.0  NaN

六、缺失值處理

pandas用numpy.nan表示缺失值，不參與計算。
去掉缺失行：

In [58]: df1.dropna(how='any')
Out[58]: 
                   A         B         C  D    F    E
2013-01-02  1.212112 -0.173215  0.119209  5  1.0  1.0

填充缺失值：

In [59]: df1.fillna(value=5)   //對缺失值處進行填充
Out[59]: 
                   A         B         C  D    F    E
2013-01-01  0.000000  0.000000 -1.509059  5  5.0  1.0
2013-01-02  1.212112 -0.173215  0.119209  5  1.0  1.0
2013-01-03 -0.861849 -2.104569 -0.494929  5  2.0  5.0
2013-01-04  0.721555 -0.706771 -1.039575  5  3.0  5.0

判斷何處缺失：

In [60]: pd.isnull(df1)    //判斷位置元素是否為缺失值
Out[60]: 
                A      B      C      D      F      E
2013-01-01  False  False  False  False   True  False
2013-01-02  False  False  False  False  False  False
2013-01-03  False  False  False  False  False   True

              
           
              
              
            
            相關推薦
			   
            
            
            
 

    

    
    Python pandas快速入門
      
							
							
							來自官網十分鐘教學 
Pandas的主要資料結構：




  Dimensions
  Name
  Description



  1
  Series
  1D labeled homogeneously-typed array


  2
  Data 

  
 

    

    
    Python點滴(四)—pandas快速入門使用
      
2016-01-01 00:00:00    383
2016-01-01 00:00:01    495
2016-01-01 00:00:02     67
2016-01-01 00:00:03    187
2016-01-01 00:00:04    416
2016-01-01 00:00:0 

  
 

    

    
    Pandas快速入門（深度學習入門2）
       

  
 

    

    
    【機器學習】Numpy&Pandas 快速入門筆記
      Numpy   Pandas   Numpy&Pandas  快速入門筆記Xu An   2018-4-6######Numpy部分######1、創建arrayimport numpy as npa=np.array([[2,23,4],[21,3,43],[34,43,234]],dtype=np 

  
 

    

    
    Python Request快速入門
      community   python   啟用   reference   article   編碼   issue   解碼   python-re   原文轉載地址：https://blog.csdn.net/iloveyin/article/details/21444613
快速上手
迫不及待了嗎？本頁 

  
 

    

    
    [Python]Pandas簡單入門（轉）
      
                本篇文章轉自 https://colab.research.google.com/notebooks/mlcc/intro_to_pandas.ipynb?hl=zh-cn#scrollTo=zCOn8ftSyddH

是Google的Machine Learning課程中關 

  
 

    

    
    Python Requests快速入門
      

更加複雜的POST請求
通常，你想要傳送一些編碼為表單形式的資料—非常像一個HTML表單。 要實現這個，只需簡單地傳遞一個字典給 data 引數。你的資料字典 在發出請求時會自動編碼為表單形式:


>>> payload = {'key1': 'value1', 'key2': 'va 

  
 

    

    
    CentOS6_Python3.6.1筆記（尚學堂-Python基礎快速入門）
      虛擬機器環境： 
 
 設定網路

  1、修改網路地址
    1.設定網絡卡為nat模式
    2.確保物理機啟動dhcp、net服務
    3.編輯檔案：vim /etc/sysconfig/network-scripts/ifcfg-eth0
    4.ONBOOT=yes
    5.設定IP 

  
 

    

    
    【Python】Python Requests快速入門
      

更加複雜的POST請求

通常，你想要傳送一些編碼為表單形式的資料—非常像一個HTML表單。 要實現這個，只需簡單地傳遞一個字典給 data 引數。你的資料字典 在發出請求時會自動編碼為表單形式:


>>> payload = {'key1': 'value1', 'key2': 'v 

  
 

    

    
    懂一點Python系列——快速入門
      ![](https://upload-images.jianshu.io/upload_images/7896890-71e6f0dcd9d841a5.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)

> 本文面相有 **一定程式設計基礎** 的 

  
 

    

    
    Python中pandas模塊快速入門
      問題   快速   title   bsp   ihe   1.5   nag   roi   panda         我這裏簡單介紹一下Python中，pandas模塊定義的兩種常用的數據結構，Series和DaraFrame。Series和Python中的Dict類似，但是是結構化的，而DataFra 

  
 

    

    
    python 基本語法速覽，快速入門
      我們   method   adding   ger   monk   use   gre   數據類型   struct   https://zhuanlan.zhihu.com/p/24536868
學習參考於這個博文。
我做一個筆記。
 
關於python一些常用的語法快速的預覽，適合已經掌握一門編程語 

  
 

    

    
    Python開發【筆記】：git&github 快速入門
      精神   源代碼   公開   平臺   per   其中   http   cvs   tro   github入門
簡介：
　　很多人都知道，Linus在1991年創建了開源的Linux，從此，Linux系統不斷發展，已經成為最大的服務器系統軟件了。
　　Linus雖然創建了Linux，但Linux的壯大 

  
 

    

    
    如何快速入門Python學習呢？
      python學習根據TIOBE最新排名 ，Python已超越C#，與Java,C,C++一起成為全球前4大最流行語言，成為互聯網時代最受歡迎的編程語言，越來越多的人選擇Python，那麽如何快速入門Python學習呢？首先你要了解Python，我們從以下幾個方面來說。 學完python前景會咋樣其實我個人是很 

  
 

    

    
    Python 零基礎 快速入門 趣味教程 (咪博士 海龜繪圖 turtle) 2. 變量
      b-   剛才   math   .com   war   單位   中學   技術分享   sublime   大家在中學就已經學過變量的概念了。例如：我們令 x = 100，則可以推出 x*2 = 200
試試下面這段 Python 代碼

1 import turtle
2  
3 turtle.sha 

  
 

    

    
    針對Quant的Python快速入門指南
      是我   調試方法   利用   learn   sub   get   講解   blog   .com   作者：用Python的交易員 （原創文章，轉載請註明出處）最近有越來越多的朋友在知乎或者QQ上問我如何學習入門Python，就目前需求來看，我需要寫這麽一篇指南。針對整個vn.py框架的學習，整體上 

  
 

    

    
    【機器學習】Python 快速入門筆記
      python   筆記   基礎   Python 快速入門筆記Xu An   2018-3-7 1、Python print#在Python3.X中使用print（）進行輸出，而2.x中使用（）會報錯
print("hello world") 
print('I\'m a 

  
 

    

    
    Python黑科技：50行代碼運用Python＋OpenCV實現人臉追蹤+詳細教程+快速入門+圖像識
      python   圖像識別   詳細   OpenCV   編程   嗨，我最親愛的夥計們，很高興我們又見面了。首先先感謝朋友們的關註。當然我更希望認識與計算機相關的領域的朋友咱們一起探討交流。重點說一下，我是真人，不是那些扒文章的自媒體組織，大家可以相互交流的！  本篇文章我們來講一下關於AI相關的人臉追蹤 

  
 

    

    
    python快速入門
      col   sans   的人   int   python   doc   att   bottom   data   本文主要寫給沒接觸過編程，python是其第一門語言的純小白旨在用最快速的方法幫助純小白掌握python 存在問題市面上的python教程一般存在一個問題，它看似寫著零基礎，但是默認是給接 

  
 

    

    
    如何快速入門python，這篇文章幫你指明方向（零基礎的福音）
      一個   交流群   企業   小項目   調用   不錯   數據類型   數據   入門   這是曾經在悟空問答回答的一個問題，後來效果還不錯，所以發出來，裏面結合了當年的學習經驗和一些行業老師的建議，希望幫助更多有興趣的人。（第三點福利）
Python語言這幾年大火，在世界編程語言排行中Python也位