Kaggle-pandas(2)

阿新 • • 發佈：2020-08-03

Intndexing-selecting-assigning

教程

介紹
選擇要處理的pandas DataFrame或Series的特定值是幾乎將要執行的任何資料操作中的一個隱含步驟，因此在Python中處理資料時需要學習的第一件事是如何選擇資料快速有效地與您相關的要點。

如果我們有Python，則可以使用索引（[]）運算子訪問其值。我們可以對DataFrame中的列執行相同的操作

在Python中，我們可以通過將物件作為屬性來訪問它的屬性。例如，一個book物件可能具有title屬性，我們可以通過呼叫book.title來訪問它。大熊貓DataFrame中的列的工作方式幾乎相同。

因此，要訪問評論的國家/地區屬性，我們可以使用：

reviews.country

0            Italy
1         Portugal
            ...   
129969      France
129970      France
Name: country, Length: 129971, dtype: object

這是從DataFrame中選擇特定系列的兩種方法。其他中的任何一個在語法上都沒有比另一個更有效，但是索引運算子[]確實具有可以處理其中帶有保留字元的列名的優點（例如，如果我們有一個國家Providence列，則reviews.country Providence不會'工作）。

Indexing in pandas

pandas之中的索引

索引運算子和屬性選擇很好，因為它們的工作方式與Python生態系統的其餘部分一樣。作為新手，這使它們易於拿起和使用。但是，pandas有自己的訪問運算子loc和iloc。對於更高階的操作，這些是您應該使用的操作。

基於索引的選擇

pandas索引以兩種範例之一進行工作。第一種是基於索引的選擇：根據資料在資料中的數字位置選擇資料。 iloc遵循此範例。
要選擇DataFrame中的第一行資料，我們可以使用以下程式碼：

reviews.iloc[0]

pandas索引方式有以下2種

loc函式：通過行索引 "Index" 中的具體值來取行資料（如取"Index"為"A"的行

）

iloc函式：通過行號來取行資料（如取第二行的資料）

loc和iloc都是第一個引數為行，第二個引數為列；這與傳統Python不同

我想獲取一個表格的第一列：

reviews.iloc[:, 0]

Manipulating the index

操作索引

基於標籤的選擇從索引中的標籤獲得其功能。至關重要的是，我們使用的索引不是一成不變的。我們可以按照我們認為合適的任何方式來操作索引。
set_index（）方法可用於完成這項工作。如果您可以為資料集找到一個比當前索引更好的索引，這將很有用。

練習

Select thedescriptioncolumn fromreviewsand assign the result to the variabledesc

# Your code here
desc = reviews["description"]

# Check your answer
q1.check()

Follow-up question: what type of object isdesc? If you're not sure, you can check by calling Python'stypefunction:type(desc).

type(desc)
#q1.hint()
#q1.solution()

Output：

pandas.core.series.Series

可以看出，其是一個Series型別的變數

Select the first value from the description column ofreviews, assigning it to variablefirst_description.

first_description = reviews["description"][0]

# Check your answer
q2.check()
first_description

Select the first row of data (the first record) fromreviews, assigning it to the variablefirst_row.

first_row = reviews.loc[0,:]

# Check your answer
q3.check()
first_row

Select the first 10 values from thedescriptioncolumn inreviews, assigning the result to variablefirst_descriptions.

Hint: format your output as a pandas Series.

first_descriptions = reviews["description"][:10]

# Check your answer
q4.check()
first_descriptions

Select the records with index labels1,2,3,5, and8, assigning the result to the variablesample_reviews.

In other words, generate the following DataFrame:

tmp=[1,2,3,5,8]
sample_reviews = reviews.loc[tmp]

# Check your answer
q5.check()
sample_reviews

Create a variabledfcontaining thecountry,province,region_1, andregion_2columns of the records with the index labels0,1,10, and100. In other words, generate the following DataFrame:

row=[0,1,10,100]
col=["country", "province", "region_1", "region_2"]
df = reviews.loc[row,col]

# Check your answer
q6.check()
df

Create a variabledfcontaining thecountryandvarietycolumns of the first 100 records.

Hint: you may uselocoriloc. When working on the answer this question and the several of the ones that follow, keep the following "gotcha" described in the tutorial:

ilocuses the Python stdlib indexing scheme, where the first element of the range is included and the last one excluded.loc, meanwhile, indexes inclusively.（即iloc為python預設的索引方式，左閉右包）

This is particularly confusing when the DataFrame index is a simple numerical list, e.g.0,...,1000. In this casedf.iloc[0:1000]will return 1000 entries, whiledf.loc[0:1000]return 1001 of them! To get 1000 elements usingloc, you will need to go one lower and ask fordf.iloc[0:999].（loc

與普通python的不一樣，它是左閉右閉的）

col=["country","variety"]
df = reviews.loc[0:99,col]

# Check your answer
q7.check()
df

Create a DataFrameitalian_winescontaining reviews of wines made inItaly. Hint:reviews.countryequals what?

italian_wines = reviews[reviews.country=="Italy"]

# Check your answer
q8.check()

Create a DataFrametop_oceania_winescontaining all reviews with at least 95 points (out of 100) for wines from Australia or New Zealand.

top_oceania_wines = reviews[reviews.country.isin(["Australia","New Zealand"])&(reviews.points>=95)]
# reviews.loc[reviews.country.isin(['Italy', 'France'])]

# Check your answer
q9.check()
top_oceania_wines

Kaggle-pandas(2)

Intndexing-selecting-assigning

教程

Indexing in pandas

練習

Kaggle-pandas(2)

Kaggle-pandas(1)

Kaggle-pandas(3)

EDA中級-Kaggle學習2-特徵工程式章

Windows環境下安裝EPDFree和pandas（包含epd_free-7.3-2安裝包下載）

Pandas缺失值2種處理方式程式碼例項

2-python資料分析-基於pandas的資料清洗、DataFrame的級聯與合併操作

2.pandas的資料結構

Pandas系列教程（2）Pandas資料結構

pandas玩Excel ---小白筆記2，感謝Timothy老師

資料分析Pandas庫學習筆記（2）

Pandas 12-綜合練習2

python-量化交易-2-pandas資料讀取

關於pandas不能使用xlrd==2.0.1.讀取xlsx的解決方案

2 Series&Pandas

pandas模組2

pandas速成筆記(2)-excel增刪改查基本操作

pandas深入淺出2.5 Pandas生成資料

WWDC 20 前你應該知道的 Swift 新特性(2)：KeyPath 用作函式

Swift5.2-字串和字元（中文檔案）

Kaggle-pandas(2)

Intndexing-selecting-assigning

教程

Indexing in pandas

練習

相關推薦