R studio 匯入資料與處理
阿新 • • 發佈:2019-02-19
- 設定常用工作目錄:
- 匯入資料,先檢視工作目錄,再把資料檔案放到工作目錄內
> getwd()
> setwd("/Users/yuki/desktop/machine learning"
> credit=read.table("german.data.txt",header=F,sep=" ",stringsAsFactors=F)
###如何知道header設定 --先檢視資料前幾行
> credit=read.table("german.data.txt",nrows=3,header=F,sep=" ",stringsAsFactors=F) > credit
- 檢視資料型別
str(credit) dim(credit)
- 更改資料標題
colnames(credit) = c(‘a’,'b'....)
- 從縮寫到真實意義的對映變數
mapping = list('A11'='... < 0 DM',
'A12'='0 <= ... < 200 DM',
'A13'='... >= 200 DM / salary assignments for at least 1 year',
'A14'='no checking account',
...
)
看起來有點複雜,首先我們建立了一個從縮寫到真實意義的對映變數`mapping`,接下來對應每個為字元的列(這裡你就明白了為什麼前面讀入資料我們用了stringsAsFactors=F),我們對列的值進行一個對映。對映的規則就是mapping。for(i in 1:(dim(credit))[2]) { if(class(credit[,i])=='character') { credit[,i] = as.factor(as.character(mapping[credit[,i]])) } }
- 將處理的變數0,1factor型別附上名字
> str(credit$V21)
$ V21: int 1 2 1 1 2 1 1 1 1 2 ...
> credit$V21 = ifelse(credit$Good.Loan==1,'GoodLoan','BadLoan') ## (1=GOOD, 2=BAD)
> str(credit$V21) chr [1:1000] "Goodloan" "Badloan" "Goodloan" "Goodloan" ... ## 這裡是字元型,方便處理要改成factor型
> credit$V21=as.factor(credit$V21)
> str(credit$V21)
Factor w/ 2 levels "Badloan","Goodloan": 2 1 2 2 1 2 2 2 2 1 ...