1. 程式人生 > >R studio 匯入資料與處理

R studio 匯入資料與處理

  • 設定常用工作目錄:

 

  • 匯入資料,先檢視工作目錄,再把資料檔案放到工作目錄內
> getwd() 

> setwd("/Users/yuki/desktop/machine learning"
> credit=read.table("german.data.txt",header=F,sep=" ",stringsAsFactors=F)

###如何知道header設定 --先檢視資料前幾行
> credit=read.table("german.data.txt",nrows=3,header=F,sep=" ",stringsAsFactors=F)
> credit


  • 檢視資料型別
    str(credit)
    dim(credit)

  • 更改資料標題
    colnames(credit) = c(‘a’,'b'....)

  • 從縮寫到真實意義的對映變數
mapping = list('A11'='... < 0 DM',
 'A12'='0 <= ... < 200 DM',
 'A13'='... >= 200 DM / salary assignments for at least 1 year',
 'A14'='no checking account',
   ...
)
for(i in 1:(dim(credit))[2]) {
  if(class(credit[,i])=='character') {
      credit[,i] = as.factor(as.character(mapping[credit[,i]]))
  }
}
看起來有點複雜,首先我們建立了一個從縮寫到真實意義的對映變數`mapping`,接下來對應每個為字元的列(這裡你就明白了為什麼前面讀入資料我們用了stringsAsFactors=F),我們對列的值進行一個對映。對映的規則就是mapping。

  • 將處理的變數0,1factor型別附上名字
> str(credit$V21)
$ V21: int  1 2 1 1 2 1 1 1 1 2 ...
> credit$V21 = ifelse(credit$Good.Loan==1,'GoodLoan','BadLoan')  ## (1=GOOD, 2=BAD)
> str(credit$V21)
 chr [1:1000] "Goodloan" "Badloan" "Goodloan" "Goodloan" ...     ##  這裡是字元型,方便處理要改成factor型
> credit$V21=as.factor(credit$V21)
> str(credit$V21)
 Factor w/ 2 levels "Badloan","Goodloan": 2 1 2 2 1 2 2 2 2 1 ...