1. 程式人生 > >理解R語言中的factor

理解R語言中的factor

轉載和整理自,向作者表示感謝

在R語言中,因子(factor)表示的是一個編號或者一個等級,即,一個點。例如,人的個數可以是1,2,3,4……那麼因子就包括,1,2,3,4…..還有描述協變數水平時,會用到高、中、低,也是因子,因為這些都是一個點。與之區別的向量,是一個連續性的值,例如,數值中有1,1.1,1.2……可以作為數值來計算,而因子則不可以。簡單通俗來講:因子是一個點,向量是一個有方向的範圍。在R中,如果把數字作為因子,那麼在匯入資料之後,需要將向量轉換為因子(factor),而因子在整個計算過程中不再作為數值,而是一個”符號”而已。

以例項進行解釋和說明

data <- c(1
,2,2,3,1,2,3,3,1,2,3,3,1) > data [1] 1 2 2 3 1 2 3 3 1 2 3 3 1 > fdata <- factor(data) > fdata [1] 1 2 2 3 1 2 3 3 1 2 3 3 1 Levels: 1 2 3 > class(fdata) [1] "factor" > class(data) [1] "numeric" #factor()函式將原來的數值型的向量轉化為了factor型別。factor型別的向量中有Levels的概念。Levels就是factor中的所有元素的集合(沒有重複)。我們可以發現Levels就是factor中元素排除重複後且字元化的結果。因為Levels的元素都是character。
> levels(fdata) [1] "1" "2" "3" #我們可以在factor生成時,通過labels向量來指定levels,繼續上面的程式: > rdata <- factor(data,labels=c("I","II","III")) > rdata [1] I II II III I II III III I II III III I Levels: I II III > rdata <- factor(data,labels=c("e","ee","eee")) > rdata [1] e ee ee eee e ee eee eee e ee eee eee e Levels: e ee eee #factors可以指定資料的順序
> mons <- c("March","April","January","November","January", "September","October","September","November","August", "January","November","November","February","May","August", "July","December","August","August","September","November", "February","April") > mons <- factor(mons) > mons [1] March April January November January [6] September October September November August [11] January November November February May [16] August July December August August [21] September November February April 11 Levels: April August December February ... September > table(mons) mons April August December February January 2 4 1 2 3 July March May November October 1 1 1 5 1 September 3 #顯然月份是有順序的,我們可以為factor指定順序 mons = factor(mons,levels=c("January","February","March","April","May","June","July","August","September","October","November","December"),ordered=TRUE) > table(mons) mons January February March April May 3 2 1 2 1 June July August September October 0 1 4 3 1 November December 5 1