提取series中的數值_Series取反不簡單

阿新 • • 發佈：2021-01-26

Series取反·不簡單

此篇介紹今天遇到的一個有意思的bug

與bug不期而遇

今天coding完畢，測試程式碼的時候，下面一段程式碼出現了錯誤。

amp_samples = value[value].index.tolist()not_amp_samples=value[~value].index.tolist()

value是一個值為True或False的Series，我想從中分別獲取值為True和False的index列表。

程式碼很簡單，pandas中使用~進行條件取反。於是，利用value[value]和value[~value]

即可進行篩選。

但是出錯了。

IndexError                                Traceback (most recent call last)-38-e29c78a5ffe7> ----> 1 value[~value]IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

index錯誤。而且很清楚的寫明是value[~value]而不是value[value]

錯誤。

復現

第一反應就是~條件取反是不是在Series中不能用。

進入ipython進行測試。

import pandas as pdseries = pd.Series({'R1':True,'R2':False,'R3':False,'R4':True})print(series[series])print(series[~series])

out:

R1    TrueR4    Truedtype: boolR2    FalseR3    Falsedtype: bool

是正常的。

那麼第二反應就是，前期的處理導致value出現除了True和False的其他值。取正時非bool值會自動轉換，但是條件取反就會失敗。

於是，在程式碼中加print獲取value的值。

R1        TrueR2       FalseR3       FalseR4        Truedtype: object

看起來很正常啊。

思考

A FEW MINUTES LATER。

我開始覺得是伺服器上的pandas和本地版本不同導致的。於是在伺服器上寫code，無法復現。

15 MINUTES LATER。

終於，我發現，有那麼一絲絲不同。

不知道各位注意到沒有，程式碼輸出中的Value最後的dtype是object，而我的測試程式碼中的dtype是bool。我們知道，object是pandas中的通用型別，雖然它通常用來儲存字串，但也可以儲存列表、元組等元素。

print(pd.Series({'R1':'sssimon yang'}))print(pd.Series({'R1':(1,2,3)}))print(pd.Series({'R1':[1,2,3]}))

out:

R1    sssimon yangdtype: objectR1    (1, 2, 3)dtype: objectR1    [1, 2, 3]dtype: object

雖然目前不知道object與bool會不會影響條件取反，但是為什麼程式碼中的value是object型別呢？

錯誤之前的程式碼是這樣的。

df['count'] = np.sum(patient_cnv[samples_map.values()],axis=1) #samples_map.values()的結果類似['R1','R2','R3','R4']for index, value in df.iterrows():    value = value[list(samples_map.values())]

可以看出，之前是想算出所有正值的總count，然後遍歷每一行。

用示例看一下加一個數值會不會影響型別。

series = pd.Series({'R1':True,'R2':False,'R3':False,'R4':True})print(series)series['count'] = series.sum()print(series)

out:

R1     TrueR2    FalseR3    FalseR4     Truedtype: boolR1       1R2       0R3       0R4       1count    2dtype: int64

嘖，直接從bool變為了int。那還是用dataframe模擬吧。

samples = ['R1','R2','R3','R4']df = pd.DataFrame([[True,False,False,True]],columns=samples)print(df)print(df.iloc[0])df['count'] = df[samples].sum(axis=1)print(df)print(df.iloc[0])

out:

     R1     R2     R3    R40  True  False  False  TrueR1     TrueR2    FalseR3    FalseR4     TrueName: 0, dtype: bool     R1     R2     R3    R4  count0  True  False  False  True      2R1        TrueR2       FalseR3       FalseR4        Truecount        2Name: 0, dtype: object

確實是從bool變為了object，因為各列很有可能是不同型別的值，所以內部機制可能是在取行時傾向於使用object。

那趕緊取反一下試試。

series = df.iloc[0]series = series[samples]print(series)print(series[series])print(series[~series])

out:

R1     TrueR2    FalseR3    FalseR4     TrueName: 0, dtype: objectR1    TrueR4    TrueName: 0, dtype: objectR2    FalseR3    FalseName: 0, dtype: object

咦，沒有錯誤，不應該啊。

無限逼近

再看原來的程式碼，裡面有個iterrows，復現的不夠像？

for index, series in df.iterrows():    series = series[samples]    print(series)    print(series[series])    print(series[~series])

out:

R1     TrueR2    FalseR3    FalseR4     TrueName: 0, dtype: objectR1    TrueR4    TrueName: 0, dtype: object---------------------------------------------------------------------------IndexError                                Traceback (most recent call last)-96-ba90254b282a>       3     print(series)      4     print(series[series])----> 5     print(series[~series])

絕了，iterrows和iloc取出的行型別都是object，一個可以條件取反，一個不能？

知識盲區。

不過既然復現了，那就要看看這神奇的取反結果是什麼。

index,series = list(df.iterrows())[0]series = series[samples]print(series)print(~series)

out:

R1     TrueR2    FalseR3    FalseR4     TrueName: 0, dtype: objectR1    -2R2    -1R3    -1R4    -2Name: 0, dtype: object

-2，-1，那看來是以True為1，False為0進行的按位取反。

再看iloc中的。

series = df.iloc[0]series = series[samples]print(series)print(~series)

out:

R1     TrueR2    FalseR3    FalseR4     TrueName: 0, dtype: objectR1    FalseR2     TrueR3     TrueR4    FalseName: 0, dtype: object

正常的條件取反。

分析

我們知道在raw python中存在按位取反~1輸出-2，所以這種結果看起來是raw python與pandas在~使用上的衝突，在iloc結果中還可以保持pandas中的條件取反，但是在iterrows()中，按位取反就被暴露出來了。

按照道理來講這兩個之前不應該有區別，所以我認為這是個pandas中的bug，看能不能提個tissue。

解決起來當然就很簡單了，強制轉為bool型別就可以了。

value = value.astype(bool)

我

我是SSSimon Yang，關注我，用code解讀世界

提取series中的數值_Series取反不簡單

Series取反·不簡單

與bug不期而遇

復現

思考

無限逼近

分析

我

提取series中的數值_Series取反不簡單

提取series中的數值_pd.loc[行,列]怎麼有時候會返回一個值,有時候又返回一個series呢?...

python 將列表中所有資料取反_Python資料結構中的列表

你不會還不知道按位取反運算的原理吧

在python中對於bool布林值的取反操作

C++中的與、或、異或、取反和左移等運算子解析

SQL取某分組（或類別）中數值最大的一個

C#中bigint 型別儲存資料大於17位後js取值不精確

python3中pip3安裝出錯,找不到SSL的解決方式

python [:3] 實現提取陣列中的數

python取均勻不重複的隨機數方式

Python Opencv提取圖片中某種顏色組成的圖形的方法

解決IDEA中編輯HTML格式檔案不自動縮排問題

vue中js判斷長時間不操作介面自動退出登入(推薦)

Python3使用騰訊雲文字識別(騰訊OCR)提取圖片中的文字內容例項詳解

Vue中keep-alive 實現後退不重新整理並保持滾動位置

Java POI讀取excel中數值精度損失問題解決

Django中ORM找出內容不為空的資料例項

Win10電腦中設定成平板模式不能觸屏怎麼解決

win10系統中預設閘道器不可用老是掉線怎麼辦

提取series中的數值_Series取反不簡單

Series取反·不簡單

與bug不期而遇

復現

思考

無限逼近

分析

我

相關推薦