1. 程式人生 > 其它 >pandas duplicated() 重複行標記與drop_duplicates()刪除

pandas duplicated() 重複行標記與drop_duplicates()刪除

技術標籤:pythonpythonpandas

pandas.DataFrame.duplicated

DataFrame.duplicated(subset=None,keep='first')

返回表示重複行的布林序列。

Parameters:

1)subsetcolumn label or sequence of labels, optional

#用來指定特定的列,預設所有列

Only consider certain columns for identifying duplicates, by default use all of the columns.

2)keep{‘first’, ‘last’, False}, default ‘first’

#刪除重複項並保留第一次出現的項

Determines which duplicates (if any) to mark.

  • first: Mark duplicates asTrueexcept for the first occurrence.

  • last: Mark duplicates asTrueexcept for the last occurrence.

#keep='last'引數就是讓系統從後向前開始篩查,這樣索引小的重複行會返回 'True'。

  • False : Mark all duplicates asTrue.

栗子:

import pandas as pd
data=pd.DataFrame({'district':['A','A','B','B','C','C'],'count':[50,50,60,60,80,80]})

重複行返回“True”

data.duplicated()

用drop_duplicates()刪除重複行

data.drop_duplicates()

去除後的行索引沒有更新,所以用reset_index(drop=True)進行行索引更新

data.drop_duplicates().reset_index(drop=True)