dataframe按值(非索引)查找多行
阿新 • • 發佈:2018-03-02
trace wrapper pandas 可用 error site values result bsp
很多情況下,我們會根據一個dataframe裏面的值來查找而不是根據索引來查找。
首先我們創建一個dataframe:
>>> col = ["id","name","sex","age"] >>> name = {1:"chen",2:"wang",3:"hu",4:"lee",5:"liu"} >>> id = range(1,6) >>> sex = {1:1,2:0,3:1,4:1,5:0} >>> age = {1:20,2:18,3:21,4:20,5:18} >>> data = {"id":id,"name":name,"sex":sex,"age":age} >>> data {‘sex‘: {1: 1, 2: 0, 3: 1, 4: 1, 5: 0}, ‘age‘: {1: 20, 2: 18, 3: 21, 4: 20, 5: 18}, ‘name‘: {1: ‘chen‘, 2: ‘wang‘, 3: ‘hu‘, 4: ‘lee‘, 5: ‘liu‘}, ‘id‘: range(1, 6)} >>> df = pd.DataFrame(data,columns=col,index=id) >>> df id name sex age1 1 chen 1 20 2 2 wang 0 18 3 3 hu 1 21 4 4 lee 1 20 5 5 liu 0 18 >>> df = df.set_index("id") >>> df.set_index("id") name sex age id 1 chen 1 20 2 wang 0 18 3 hu 1 21 4 lee 1 20 5 liu 0 18
如果我們要選年齡大於等於20歲的,這個好辦:
>>> df[df["age"]>=20] name sex age id 1 chen 1 20 3 hu 1 21 4 lee 1 20
或者選出所有女生(sex=0的),也好辦:
>>> df[df["sex"]==0] name sex age id 2 wang 0 18 5 liu 0 18
也可用where,但不太方便:(一般不會這樣用)
>>> df.where(df["sex"]==0) name sex age id 1 NaN NaN NaN 2 wang 0.0 18.0 3 NaN NaN NaN 4 NaN NaN NaN 5 liu 0.0 18.0 >>> df.where(df["age"]>=20) name sex age id 1 chen 1.0 20.0 2 NaN NaN NaN 3 hu 1.0 21.0 4 lee 1.0 20.0 5 NaN NaN NaN
但是如果要按名字來選出,就不能這樣了,得用.isin()方法。
>>> select_name = ["chen","lee","liu"] >>> df[df["name"]==select_name] Traceback (most recent call last): File "<stdin>", line 1, in <module> File "E:\Python3\lib\site-packages\pandas\core\ops.py", line 855, in wrapper res = na_op(values, other) File "E:\Python3\lib\site-packages\pandas\core\ops.py", line 759, in na_op result = _comp_method_OBJECT_ARRAY(op, x, y) File "E:\Python3\lib\site-packages\pandas\core\ops.py", line 737, in _comp_method_OBJECT_ARRAY result = lib.vec_compare(x, y, op) File "pandas\lib.pyx", line 868, in pandas.lib.vec_compare (pandas\lib.c:15418) ValueError: Arrays were different lengths: 5 vs 3 # 可以看到匹配會出錯 >>> df[df["name"].isin(select_name)] name sex age id 1 chen 1 20 4 lee 1 20 5 liu 0 18
如果要選出既是屬於名字裏的又是男生(sex=1):
>>> df[df["name"].isin(select_name) & df["sex"]==1] name sex age id 1 chen 1 20 4 lee 1 20
這裏如果用
>>> df.isin({"name":select_name,"sex":[1]}) name sex age id 1 True True False 2 False False False 3 False True False 4 True True False 5 True False False >>> df[df.isin({"name":select_name,"sex":[1]})] # 這裏得是[1],非1 name sex age id 1 chen 1.0 NaN 2 NaN NaN NaN 3 NaN 1.0 NaN 4 lee 1.0 NaN 5 liu NaN NaN
好像並不好。
dataframe按值(非索引)查找多行