pandas frame 刪除一行_利用Python進行資料分析（第五章、Pandas入門）【三】

阿新 • • 發佈：2020-12-18

技術標籤：pandas frame 刪除一行 pandas loc 正則匹配 pandas reindex python中series怎麼重建索引

5.1.3　索引物件

pandas中的索引物件是用於儲存軸標籤和其他元資料的(例如軸名稱或標籤)。在構造Series或DataFrame時，所使用的任意陣列或標籤序列都可以在內部轉換為索引物件：

obj=pd.Series(range(3),index=['a','b','c'])
index=obj.index

print(index)
print(index[1:])

Index(['a', 'b', 'c'], dtype='object')
 
Index(['b', 'c'], dtype='object')

索引物件是不可變的，因此使用者是無法修改索引物件的：

index[1]='d'#TypeError

---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

 in ()
----> 1 index[1] = 'd'  # TypeError
/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/base.py in __setitem__(self, key, value)
 
   3936
   3937     def __setitem__(self, key, value):
-> 3938         raise TypeError("Index does not support mutable operations")
   3939
   3940     def __getitem__(self, key):
TypeError: Index does not support mutable operations

不變性使得在多種資料結構中分享索引物件更為安全：

labels=pd.Index(np.arange(3))
print(labels)
 

print("-"*50)
obj2=pd.Series([1.5,-2.5,0],index=labels)
print(obj2)

print("-"*50)
print(obj2.indexislabels)

Int64Index([0, 1, 2], dtype='int64')
--------------------------------------------------
0    1.5
1   -2.5
2    0.0
dtype: float64
--------------------------------------------------
True

一些使用者並不經常利用索引物件提供的功能，但是因為一些操作會產生包含索引化資料的結果，理解索引如何工作還是很重要的。

除了類似陣列，索引物件也像一個固定大小的集合：

print(frame3)
print("-"*50)
print(frame3.columns)
print("-"*50)
print('Ohio'inframe3.columns)
print("-"*50)
print(2002inframe3.index)
print(2003inframe3.index)

state  Nevada  Ohio
year
2000      NaN   1.5
2001      2.4   1.7
2002      2.9   3.6
--------------------------------------------------
Index(['Nevada', 'Ohio'], dtype='object', name='state')
--------------------------------------------------
True
--------------------------------------------------
True
False

與Python集合不同，pandas索引物件可以包含重複標籤：

dup_labels=pd.Index(['foo','foo','bar','bar'])

print(dup_labels)
print(dup_labels.unique())
print('|'.join(dup_labels.unique()))

Index(['foo', 'foo', 'bar', 'bar'], dtype='object')
Index(['foo', 'bar'], dtype='object')
foo|bar

根據重複標籤進行篩選，會選取所有重複標籤對應的資料。

每個索引都有一些集合邏輯的方法和屬性，這些方法和屬性解決了關於它所包含的資料的其他常見問題。下表總結了這些方法和屬性中常用的一部分。

方法	描述
append	將額外的索引物件貼上到原索引後,產生一個新的索引
difference	計算兩個索引的差集
intersection	計算兩個索引的交集
union	計算兩個索引的並集
isin	計算表示每一個值是否在傳值容器中的布林陣列
delete	將位置i的元素刪除,併產生新的索引
drop	根據傳參刪除指定索引值,併產生新的索引
insert	在位置i插入元素,併產生新的索引
is monotonic	如果索引序列遞增則返回True
Is unique	如果索引序列唯一則返回True
unique	計算索引的唯一值序列

5.2　基本功能

接下來將會指引瞭解與Series或DataFrame中資料互動的基礎機制。後續的內容中會更為深入地講解使用pandas進行資料分析和操作的主題。

5.2.1　重建索引

reindex是pandas物件的重要方法，該方法用於建立一個符合新索引的新物件。考慮下面的例子：

obj=pd.Series([4.5,7.2,-5.3,3.6],index=['d','b','a','c'])

obj

d    4.5
b    7.2
a   -5.3
c    3.6
dtype: float64

Series呼叫reindex方法時，會將資料按照新的索引進行排列，如果某個索引值之前並不存在，則會引入缺失值：

obj2=obj.reindex(['a','b','c','d','e'])

obj2

a   -5.3
b    7.2
c    3.6
d    4.5
e    NaN
dtype: float64

對於順序資料，比如時間序列，在重建索引時可能會需要進行插值或填值。method可選引數允許使用諸如ffill等方法在重建索引時插值，ffill方法會將值前向填充：

obj3=pd.Series(['blue','purple','yellow'],index=[0,2,4])
obj3

obj3.reindex(range(6),method='ffill')

0      blue
1      blue
2    purple
3    purple
4    yellow
5    yellow
dtype: object

在DataFrame中，reindex可以改變行索引、列索引，也可以同時改變二者。當僅傳入一個序列時，結果中的行會重建索引：

frame=pd.DataFrame(np.arange(9).reshape((3,3)),
index=['a','c','d'],
columns=['Ohio','Texas','California'])

frame

	Ohio	Texas	California
a	0	1	2
c	3	4	5
d	6	7	8

frame2=frame.reindex(['a','b','c','d'])

frame2

	Ohio	Texas	California
a	0.0	1.0	2.0
b	NaN	NaN	NaN
c	3.0	4.0	5.0
d	6.0	7.0	8.0

列可以使用columns關鍵字重建索引：

states=['Texas','Utah','California']
frame.reindex(columns=states)

	Texas	Utah	California
a	1	NaN	2
c	4	NaN	5
d	7	NaN	8

下表是reindex方法的引數列表。

引數	描述
index	新建作為索引的序列, 可以是索引例項或任意其他序列型Python資料結構索引使用時無須複製
method	插值方式;'ffill'為前向填充,而'bfill'是後向填充
fill_value	通過重新索引引入缺失資料時使用的替代值
limit	當前向或後向填充時,所需填充的最大尺寸間隙(以元素數量)
tolerance	當前向或後向填充時, 所需填充的不精確匹配下的最大尺寸間隙(以絕對數字距離)
level	匹配 MultiIndex級別的簡單索引;否則選擇子集
copy	如果為True,即使新索引等於舊索引,也總是複製底層資料; 如果是 False,則在索引相同時不要複製資料

更深入地探索時，可以使用loc進行更為簡潔的標籤索引，許多使用者更傾向於使用這種方式：

#不增加資料內容的，不會產生報錯
print(frame.loc[['a','c','d'],['Ohio','Texas']])
print("-"*50)
#會有後期功能不可用的警告(提示使用reindex)
print(frame.loc[['a','b','c','d'],states])

   Ohio  Texas
a     0      1
c     3      4
d     6      7
--------------------------------------------------
   Texas  Utah  California
a    1.0   NaN         2.0
b    NaN   NaN         NaN
c    4.0   NaN         5.0
d    7.0   NaN         8.0


/usr/local/lib/python3.6/dist-packages/pandas/core/indexing.py:1494: FutureWarning:
Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

See the documentation here:
https://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
  return self._getitem_tuple(key)

歡迎關注我的微信公眾號一起交流！

pandas frame 刪除一行_利用Python進行資料分析（第五章、Pandas入門）【三】

技術標籤：pandas frame 刪除一行pandas loc 正則匹配pandas reindexpython中series怎麼重建索引

利用python進行資料分析（第二版）筆記

Numpy 1、建立ndarray物件 1、arr.ndim：Numpy陣列維度資訊 2、arr.shape：Numpy陣列形狀資訊

《利用python做資料分析》第十章：時間序列分析

import pandas as pd import numpy as np import matplotlib.pyplot as plt %matplotlib inline [/code] //anaconda/lib/python2.7/site-packages/matplotlib/font_manager.py:273:

高階Pandas知識圖譜-《利用Python進行資料分析》

所有內容整理自《利用Python進行資料分析》，使用MindMaster Pro 7.3製作，emmx格式，原始檔已經上傳Github，需要的同學轉左上角自行下載或者右擊儲存圖片。

利用Python進行資料分析_資料聚合與分組運算_資料聚合

GroupBy 按發行人彙總2021年截至目前債券實際發行規模的統計 from pandas import Series,DataFrame

利用Python進行資料分析_資料聚合與分組運算_分組級運算和轉換

transform方法 transform會講一個函式運用到各個分組。檔案6.xlsx的內容如下：假設我們想為DataFrame新增一個用於存放各索引分組平均值的列。我們可以先聚合再合併：

資料載入、儲存及檔案格式知識圖譜-《利用Python進行資料分析》

資料清洗與準備知識圖譜-《利用Python進行資料分析》

資料規整：連線、聯合與重塑知識圖譜-《利用Python進行資料分析》

繪圖和視覺化知識圖譜-《利用Python進行資料分析》

資料聚合與分組操作知識圖譜-《利用Python進行資料分析》

時間序列知識圖譜-《利用Python進行資料分析》

《利用Python進行資料分析》 —— （1）

《利用Python進行資料分析》 —— （1） Python的學習需要自主探索各種型別，函式和方法的文件。

《利用Python進行資料分析》 —— （2）

《利用Python進行資料分析》 —— （2）本章主要介紹Python常用的資料結構和以及Python函式等基礎知識。

"利用python進行資料分析"學習記錄01

"利用python進行資料分析"學習記錄 --day0108/02 與書相關的資料在 http://github.com/wesm/pydata-book

利用python進行資料分析-第四章筆記

Chapter 4 NumPy Basics: Arrays and Vectorized Computation 題外話：numpy short for numerical python

利用python進行資料分析-第五章筆記

Chapter 5 Getting Started with pandas 這一章要介紹 pandas 的基礎。都是資料處理包，pandas 和 numpy 的區別在於：

利用python進行資料分析-第六章筆記

Chapter 6 Data Loading, Storage, and File Formats Reading and Writing Data in Text Format 最常用的是 read_csv 和 read_table，不過數模競賽裡很多都是用 excel 給資料，不知道今年是個啥情況。

《利用python進行資料分析》學習筆記（一）

處理usa.gov資料匯入資料 import jsonpath = \'usagov_bitly_data2012-03-16-1331923249.txt\'records = [json.loads(line) for line in open(path)]

《利用Python進行資料分析》筆記---第2章--MovieLens 1M資料集

寫在前面的話：例項中的所有資料都是在GitHub上下載的，打包下載即可。地址是： [ http://github.com/pydata/pydata-book ](http://github.com/pydata/pydata-

pandas frame 刪除一行_利用Python進行資料分析（第五章、Pandas入門）【三】

5.1.3 索引物件

5.2 基本功能

5.2.1 重建索引

相關推薦

5.1.3　索引物件

5.2　基本功能

5.2.1　重建索引