22、python資料處理虛擬變數的轉化

阿新 • • 發佈：2018-12-07

虛擬變數（dummy variables）：虛擬變數，也叫啞變數和離散特徵編碼。可用來表示分類變數、費數量因素可能產生的影響

01 離散特徵的取值之間有大小意義

例如：尺寸（L、XL、XXL）

02 離散特徵的取值之間沒有大小的意義

例如：顏色

1 離散特徵的取值之間有大小的意義的處理函式

pamdas.Series.map(dict)

   離散特徵的取值之間有大小意義的處理函式。
dict 對映的字典


2 離散特徵的取值之間沒有大小的意義的處理函式

## 函式

pandas.get_dummies(data,prefix=None,prefix_sep='',dummy_na=False,columns=None,drop_first=False)

01 data: 要處理的DataFrame
02 prefix：列名的字首，在多個列有相同的離散項時候使用

03 prefix_sep 字首和離散值得分隔符，預設為下劃線，莫惹即可

04 dummy_na 是否把NA值，作為一個離散值進行處理，預設不處理。

05 columns 要處理的列名，如果不指定該列，那麼預設處理所有的列

06 drop_first 是否從備選項中刪除第一個，建模的時候為避免共線性使用

3 案例：

import pandas

data=pandas.read_csv(
'D:\\DATA\\pycase\\4.18虛擬變數\\data.csv',
engine='python',
sep=',',
encoding='utf8'
)

## 檢視去重之後的學歷分類（有大小區分）

data['Education Level'].drop_duplicates()

"""
博士後 Post-Doc
博士 Doctorate
碩士 Master's Degree
學士 Bachelor's Degree
副學士 Associate's Degree
專業院校 some college
職業學校 Trade School
高中 High School
小學 Grade School
"""

# 對字典進行命名

educationLevelDict={
'Post-Doc':9,
'Doctorate':8,
'Master\'s Degree':7,
'Bachelor\'s Degree':6,
'Associate\'s Degree':5,
'some college':4,
'Trade School':3,
'High School':2,
'Grade School':1
}

# 增加虛擬變數

data['Education Level Map']=data[
'Education Level'
].map(educationLevelDict)

02 性別去重（沒有大小衡量）

data['Gender'].drop_duplicates()

# 無法進行大小比較的指標進行虛擬變數的轉化和新增

dumies=pandas.get_dummies(
data,
columns=['Gender'],
prefix=['Gender'],
prefix_sep="_",
dummy_na=False,
drop_first=False
)

data['Gender2']=dumies['Gender']

22、python資料處理虛擬變數的轉化

22、python資料處理虛擬變數的轉化

python資料處理庫numpy、pandas陣列操作

Python 資料處理庫 pandas 入門教程

python資料處理小技巧-2

Python資料處理 | (三) Matplotlib資料視覺化

python資料處理----常用資料檔案的處理

Python資料處理之（三）Numpy建立array

Python資料處理之（二）Numpy屬性

Python資料處理之（一）為什麼要學習 Numpy & Pandas？

Python資料處理之（四）numpy基礎運算1

Python資料處理之（七）Numpy array 合併

Python資料處理之（十一）Pandas 選擇資料

Python資料處理之（十）Pandas 基本介紹

Python資料處理之（九）Numpy copy & deep copy

Python資料處理之（八）Numpy array分割

Python資料處理之（六）numpy索引

Python資料處理之（五）numpy基礎運算2

Python資料處理之（十五）Pandas 合併concat

Python資料處理常用操作

Python資料處理之（十八）10分鐘搞定matplotlib

22、python資料處理虛擬變數的轉化

相關推薦