R語言——K折交叉驗證之隨機均分數據集
阿新 • • 發佈:2017-06-02
present sent new 理解 6.5 ble 數據表 uno repr
今天,在閱讀吳喜之教授的《復雜數據統計方法》時,遇到了把一個數據集按照某個因子分成若幹子集,再把若幹子集隨機平均分成n份的問題,吳教授的方法也比較好理解,但是我還是覺得有點繁瑣,因此自己編寫了一個函數,此後遇到這種問題只需要運行一下函數就可以了。
這裏采用R中自帶的iris數據集,
> str(iris) ‘data.frame‘: 150 obs. of 5 variables: $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
iris數據集結構如上所示,其中Species是一個因子型數據,共有三個水平,根據Species將其可以分成三個子集,對每個子集進行五折交叉驗證的話,需要把每個數據集均分成五份,R語言代碼如下:
fiveDivide<-function(col,data,n=5) { #col is a facotr type column,divide each group of the dataframe #into n partitions,string type #data is a data.frame type in R #n represents the numbers which you want to divide into,default 5 #the function return a list contain n data.frame #use sample(x) generate x numbers in unordered rank,then #divide the x numebr into n partitions group_num=length(levels(data[,col])) # lst1=list() #按照因子分類把原數據分成group_num份 lst2=list() #把每一個gruop分成等分的數據框 lst3=list() # for(i in 1:group_num) { lst1[[i]]=data[data[col]==levels(data[,col])[i],] #這裏先把原數據集按照因子水平分成n個子集 } for(k in 1:group_num) #這個循環的目的就是把麽個子集平均分成n份,並且是隨機分的,需要用到sample函數 { od=sample(nrow(lst1[[k]])) newdata=lst1[[k]][od,] len=length(od) cutpoint=floor(len/n) for(j in 1:n) { if(len>=cutpoint*(1+j)) { lst2[[j]]=newdata[(cutpoint*(j-1)+1):(cutpoint*j),] } else { lst2[[j]]=newdata[(cutpoint*(j-1)+1):len,] } } lst3[[k]]=lst2 } return(lst3) #lst2=list() }
對iris進行處理:
> rep=fiveDivide("Species",iris,5) > str(rep) List of 3 $ :List of 5 ..$ :‘data.frame‘: 10 obs. of 5 variables: .. ..$ Sepal.Length: num [1:10] 4.8 5.2 4.8 4.7 5.5 5.1 4.8 4.4 4.8 4.9 .. ..$ Sepal.Width : num [1:10] 3 3.5 3.4 3.2 3.5 3.7 3.1 3 3.4 3 .. ..$ Petal.Length: num [1:10] 1.4 1.5 1.6 1.6 1.3 1.5 1.6 1.3 1.9 1.4 .. ..$ Petal.Width : num [1:10] 0.3 0.2 0.2 0.2 0.2 0.4 0.2 0.2 0.2 0.2 .. ..$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ..$ :‘data.frame‘: 10 obs. of 5 variables: .. ..$ Sepal.Length: num [1:10] 5 4.7 4.8 5.2 5.1 5.1 4.9 5.4 5 5.5 .. ..$ Sepal.Width : num [1:10] 3.5 3.2 3 3.4 3.5 3.8 3.1 3.4 3.5 4.2 .. ..$ Petal.Length: num [1:10] 1.3 1.3 1.4 1.4 1.4 1.5 1.5 1.7 1.6 1.4 .. ..$ Petal.Width : num [1:10] 0.3 0.2 0.1 0.2 0.2 0.3 0.1 0.2 0.6 0.2 .. ..$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ..$ :‘data.frame‘: 10 obs. of 5 variables: .. ..$ Sepal.Length: num [1:10] 5.4 4.3 4.9 5.4 4.4 4.6 5.1 5 5.1 5.1 .. ..$ Sepal.Width : num [1:10] 3.9 3 3.6 3.9 3.2 3.6 3.4 3.4 3.8 3.8 .. ..$ Petal.Length: num [1:10] 1.3 1.1 1.4 1.7 1.3 1 1.5 1.6 1.9 1.6 .. ..$ Petal.Width : num [1:10] 0.4 0.1 0.1 0.4 0.2 0.2 0.2 0.4 0.4 0.2 .. ..$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ..$ :‘data.frame‘: 10 obs. of 5 variables: .. ..$ Sepal.Length: num [1:10] 4.4 4.5 5.3 5 5 5.1 5.4 5.2 5.1 5.4 .. ..$ Sepal.Width : num [1:10] 2.9 2.3 3.7 3.3 3.4 3.3 3.7 4.1 3.5 3.4 .. ..$ Petal.Length: num [1:10] 1.4 1.3 1.5 1.4 1.5 1.7 1.5 1.5 1.4 1.5 .. ..$ Petal.Width : num [1:10] 0.2 0.3 0.2 0.2 0.2 0.5 0.2 0.1 0.3 0.4 .. ..$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ..$ :‘data.frame‘: 10 obs. of 5 variables: .. ..$ Sepal.Length: num [1:10] 4.6 5.8 5 5 5 4.6 5.7 4.9 5.7 4.6 .. ..$ Sepal.Width : num [1:10] 3.4 4 3.6 3.2 3 3.2 4.4 3.1 3.8 3.1 .. ..$ Petal.Length: num [1:10] 1.4 1.2 1.4 1.2 1.6 1.4 1.5 1.5 1.7 1.5 .. ..$ Petal.Width : num [1:10] 0.3 0.2 0.2 0.2 0.2 0.2 0.4 0.2 0.3 0.2 .. ..$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 $ :List of 5 ..$ :‘data.frame‘: 10 obs. of 5 variables: .. ..$ Sepal.Length: num [1:10] 6.2 6 5.8 6.3 5.5 5.8 5.8 6.1 6.2 5.6 .. ..$ Sepal.Width : num [1:10] 2.9 3.4 2.7 3.3 2.6 2.6 2.7 3 2.2 3 .. ..$ Petal.Length: num [1:10] 4.3 4.5 3.9 4.7 4.4 4 4.1 4.6 4.5 4.1 .. ..$ Petal.Width : num [1:10] 1.3 1.6 1.2 1.6 1.2 1.2 1 1.4 1.5 1.3 .. ..$ Species : Factor w/ 3 levels "setosa","versicolor",..: 2 2 2 2 2 2 2 2 2 2 ..$ :‘data.frame‘: 10 obs. of 5 variables: .. ..$ Sepal.Length: num [1:10] 6.4 5.6 5.7 6.6 6 6.4 5.9 6.9 6.7 5.5 .. ..$ Sepal.Width : num [1:10] 3.2 2.5 2.8 3 2.2 2.9 3 3.1 3.1 2.5 .. ..$ Petal.Length: num [1:10] 4.5 3.9 4.5 4.4 4 4.3 4.2 4.9 4.4 4 .. ..$ Petal.Width : num [1:10] 1.5 1.1 1.3 1.4 1 1.3 1.5 1.5 1.4 1.3 .. ..$ Species : Factor w/ 3 levels "setosa","versicolor",..: 2 2 2 2 2 2 2 2 2 2 ..$ :‘data.frame‘: 10 obs. of 5 variables: .. ..$ Sepal.Length: num [1:10] 6.5 5.2 6.8 6 5.7 5 6.3 5.7 5.5 5.6 .. ..$ Sepal.Width : num [1:10] 2.8 2.7 2.8 2.9 2.9 2.3 2.5 2.8 2.3 3 .. ..$ Petal.Length: num [1:10] 4.6 3.9 4.8 4.5 4.2 3.3 4.9 4.1 4 4.5 .. ..$ Petal.Width : num [1:10] 1.5 1.4 1.4 1.5 1.3 1 1.5 1.3 1.3 1.5 .. ..$ Species : Factor w/ 3 levels "setosa","versicolor",..: 2 2 2 2 2 2 2 2 2 2 ..$ :‘data.frame‘: 10 obs. of 5 variables: .. ..$ Sepal.Length: num [1:10] 6.6 6.7 5 6.7 5.9 6.1 5.7 5.4 6 5.1 .. ..$ Sepal.Width : num [1:10] 2.9 3 2 3.1 3.2 2.8 2.6 3 2.7 2.5 .. ..$ Petal.Length: num [1:10] 4.6 5 3.5 4.7 4.8 4 3.5 4.5 5.1 3 .. ..$ Petal.Width : num [1:10] 1.3 1.7 1 1.5 1.8 1.3 1 1.5 1.6 1.1 .. ..$ Species : Factor w/ 3 levels "setosa","versicolor",..: 2 2 2 2 2 2 2 2 2 2 ..$ :‘data.frame‘: 10 obs. of 5 variables: .. ..$ Sepal.Length: num [1:10] 5.6 6.1 6.3 7 4.9 5.7 5.5 5.5 6.1 5.6 .. ..$ Sepal.Width : num [1:10] 2.7 2.9 2.3 3.2 2.4 3 2.4 2.4 2.8 2.9 .. ..$ Petal.Length: num [1:10] 4.2 4.7 4.4 4.7 3.3 4.2 3.8 3.7 4.7 3.6 .. ..$ Petal.Width : num [1:10] 1.3 1.4 1.3 1.4 1 1.2 1.1 1 1.2 1.3 .. ..$ Species : Factor w/ 3 levels "setosa","versicolor",..: 2 2 2 2 2 2 2 2 2 2 $ :List of 5 ..$ :‘data.frame‘: 10 obs. of 5 variables: .. ..$ Sepal.Length: num [1:10] 6.9 6.7 6.1 6.4 6.4 6.7 5.7 6.5 6.4 6.3 .. ..$ Sepal.Width : num [1:10] 3.2 2.5 2.6 2.8 3.1 3.3 2.5 3 2.7 2.9 .. ..$ Petal.Length: num [1:10] 5.7 5.8 5.6 5.6 5.5 5.7 5 5.5 5.3 5.6 .. ..$ Petal.Width : num [1:10] 2.3 1.8 1.4 2.1 1.8 2.1 2 1.8 1.9 1.8 .. ..$ Species : Factor w/ 3 levels "setosa","versicolor",..: 3 3 3 3 3 3 3 3 3 3 ..$ :‘data.frame‘: 10 obs. of 5 variables: .. ..$ Sepal.Length: num [1:10] 5.8 7.7 6.5 6.4 7.4 6.3 6.8 6 6.7 6.8 .. ..$ Sepal.Width : num [1:10] 2.8 2.8 3.2 3.2 2.8 3.3 3 2.2 3.3 3.2 .. ..$ Petal.Length: num [1:10] 5.1 6.7 5.1 5.3 6.1 6 5.5 5 5.7 5.9 .. ..$ Petal.Width : num [1:10] 2.4 2 2 2.3 1.9 2.5 2.1 1.5 2.5 2.3 .. ..$ Species : Factor w/ 3 levels "setosa","versicolor",..: 3 3 3 3 3 3 3 3 3 3 ..$ :‘data.frame‘: 10 obs. of 5 variables: .. ..$ Sepal.Length: num [1:10] 5.8 6.2 6 6.1 7.7 5.6 6.3 7.3 7.2 6.9 .. ..$ Sepal.Width : num [1:10] 2.7 2.8 3 3 2.6 2.8 2.8 2.9 3 3.1 .. ..$ Petal.Length: num [1:10] 5.1 4.8 4.8 4.9 6.9 4.9 5.1 6.3 5.8 5.4 .. ..$ Petal.Width : num [1:10] 1.9 1.8 1.8 1.8 2.3 2 1.5 1.8 1.6 2.1 .. ..$ Species : Factor w/ 3 levels "setosa","versicolor",..: 3 3 3 3 3 3 3 3 3 3 ..$ :‘data.frame‘: 10 obs. of 5 variables: .. ..$ Sepal.Length: num [1:10] 6.7 7.2 7.2 6.3 6.3 6.5 6.3 7.7 7.9 6.5 .. ..$ Sepal.Width : num [1:10] 3 3.2 3.6 2.7 2.5 3 3.4 3.8 3.8 3 .. ..$ Petal.Length: num [1:10] 5.2 6 6.1 4.9 5 5.8 5.6 6.7 6.4 5.2 .. ..$ Petal.Width : num [1:10] 2.3 1.8 2.5 1.8 1.9 2.2 2.4 2.2 2 2 .. ..$ Species : Factor w/ 3 levels "setosa","versicolor",..: 3 3 3 3 3 3 3 3 3 3 ..$ :‘data.frame‘: 10 obs. of 5 variables: .. ..$ Sepal.Length: num [1:10] 7.7 6.4 6.2 6.9 6.7 7.1 5.8 4.9 5.9 7.6 .. ..$ Sepal.Width : num [1:10] 3 2.8 3.4 3.1 3.1 3 2.7 2.5 3 3 .. ..$ Petal.Length: num [1:10] 6.1 5.6 5.4 5.1 5.6 5.9 5.1 4.5 5.1 6.6 .. ..$ Petal.Width : num [1:10] 2.3 2.2 2.3 2.3 2.4 2.1 1.9 1.7 1.8 2.1 .. ..$ Species : Factor w/ 3 levels "setosa","versicolor",..: 3 3 3 3 3 3 3 3 3 3
均分以後數據表現為:
> rep [[1]] [[1]][[1]] Sepal.Length Sepal.Width Petal.Length Petal.Width Species 46 4.8 3.0 1.4 0.3 setosa 28 5.2 3.5 1.5 0.2 setosa 12 4.8 3.4 1.6 0.2 setosa 30 4.7 3.2 1.6 0.2 setosa 37 5.5 3.5 1.3 0.2 setosa 22 5.1 3.7 1.5 0.4 setosa 31 4.8 3.1 1.6 0.2 setosa 39 4.4 3.0 1.3 0.2 setosa 25 4.8 3.4 1.9 0.2 setosa 2 4.9 3.0 1.4 0.2 setosa [[1]][[2]] Sepal.Length Sepal.Width Petal.Length Petal.Width Species 41 5.0 3.5 1.3 0.3 setosa 3 4.7 3.2 1.3 0.2 setosa 13 4.8 3.0 1.4 0.1 setosa 29 5.2 3.4 1.4 0.2 setosa 1 5.1 3.5 1.4 0.2 setosa 20 5.1 3.8 1.5 0.3 setosa 10 4.9 3.1 1.5 0.1 setosa 21 5.4 3.4 1.7 0.2 setosa 44 5.0 3.5 1.6 0.6 setosa 34 5.5 4.2 1.4 0.2 setosa [[1]][[3]] Sepal.Length Sepal.Width Petal.Length Petal.Width Species 17 5.4 3.9 1.3 0.4 setosa 14 4.3 3.0 1.1 0.1 setosa 38 4.9 3.6 1.4 0.1 setosa 6 5.4 3.9 1.7 0.4 setosa 43 4.4 3.2 1.3 0.2 setosa 23 4.6 3.6 1.0 0.2 setosa 40 5.1 3.4 1.5 0.2 setosa 27 5.0 3.4 1.6 0.4 setosa 45 5.1 3.8 1.9 0.4 setosa 47 5.1 3.8 1.6 0.2 setosa [[1]][[4]] Sepal.Length Sepal.Width Petal.Length Petal.Width Species 9 4.4 2.9 1.4 0.2 setosa 42 4.5 2.3 1.3 0.3 setosa 49 5.3 3.7 1.5 0.2 setosa 50 5.0 3.3 1.4 0.2 setosa 8 5.0 3.4 1.5 0.2 setosa 24 5.1 3.3 1.7 0.5 setosa 11 5.4 3.7 1.5 0.2 setosa 33 5.2 4.1 1.5 0.1 setosa 18 5.1 3.5 1.4 0.3 setosa 32 5.4 3.4 1.5 0.4 setosa [[1]][[5]] Sepal.Length Sepal.Width Petal.Length Petal.Width Species 7 4.6 3.4 1.4 0.3 setosa 15 5.8 4.0 1.2 0.2 setosa 5 5.0 3.6 1.4 0.2 setosa 36 5.0 3.2 1.2 0.2 setosa 26 5.0 3.0 1.6 0.2 setosa 48 4.6 3.2 1.4 0.2 setosa 16 5.7 4.4 1.5 0.4 setosa 35 4.9 3.1 1.5 0.2 setosa 19 5.7 3.8 1.7 0.3 setosa 4 4.6 3.1 1.5 0.2 setosa [[2]] [[2]][[1]] Sepal.Length Sepal.Width Petal.Length Petal.Width Species 98 6.2 2.9 4.3 1.3 versicolor 86 6.0 3.4 4.5 1.6 versicolor 83 5.8 2.7 3.9 1.2 versicolor 57 6.3 3.3 4.7 1.6 versicolor 91 5.5 2.6 4.4 1.2 versicolor 93 5.8 2.6 4.0 1.2 versicolor 68 5.8 2.7 4.1 1.0 versicolor 92 6.1 3.0 4.6 1.4 versicolor 69 6.2 2.2 4.5 1.5 versicolor 89 5.6 3.0 4.1 1.3 versicolor [[2]][[2]] Sepal.Length Sepal.Width Petal.Length Petal.Width Species 52 6.4 3.2 4.5 1.5 versicolor 70 5.6 2.5 3.9 1.1 versicolor 56 5.7 2.8 4.5 1.3 versicolor 76 6.6 3.0 4.4 1.4 versicolor 63 6.0 2.2 4.0 1.0 versicolor 75 6.4 2.9 4.3 1.3 versicolor 62 5.9 3.0 4.2 1.5 versicolor 53 6.9 3.1 4.9 1.5 versicolor 66 6.7 3.1 4.4 1.4 versicolor 90 5.5 2.5 4.0 1.3 versicolor [[2]][[3]] Sepal.Length Sepal.Width Petal.Length Petal.Width Species 55 6.5 2.8 4.6 1.5 versicolor 60 5.2 2.7 3.9 1.4 versicolor 77 6.8 2.8 4.8 1.4 versicolor 79 6.0 2.9 4.5 1.5 versicolor 97 5.7 2.9 4.2 1.3 versicolor 94 5.0 2.3 3.3 1.0 versicolor 73 6.3 2.5 4.9 1.5 versicolor 100 5.7 2.8 4.1 1.3 versicolor 54 5.5 2.3 4.0 1.3 versicolor 67 5.6 3.0 4.5 1.5 versicolor [[2]][[4]] Sepal.Length Sepal.Width Petal.Length Petal.Width Species 59 6.6 2.9 4.6 1.3 versicolor 78 6.7 3.0 5.0 1.7 versicolor 61 5.0 2.0 3.5 1.0 versicolor 87 6.7 3.1 4.7 1.5 versicolor 71 5.9 3.2 4.8 1.8 versicolor 72 6.1 2.8 4.0 1.3 versicolor 80 5.7 2.6 3.5 1.0 versicolor 85 5.4 3.0 4.5 1.5 versicolor 84 6.0 2.7 5.1 1.6 versicolor 99 5.1 2.5 3.0 1.1 versicolor [[2]][[5]] Sepal.Length Sepal.Width Petal.Length Petal.Width Species 95 5.6 2.7 4.2 1.3 versicolor 64 6.1 2.9 4.7 1.4 versicolor 88 6.3 2.3 4.4 1.3 versicolor 51 7.0 3.2 4.7 1.4 versicolor 58 4.9 2.4 3.3 1.0 versicolor 96 5.7 3.0 4.2 1.2 versicolor 81 5.5 2.4 3.8 1.1 versicolor 82 5.5 2.4 3.7 1.0 versicolor 74 6.1 2.8 4.7 1.2 versicolor 65 5.6 2.9 3.6 1.3 versicolor [[3]] [[3]][[1]] Sepal.Length Sepal.Width Petal.Length Petal.Width Species 121 6.9 3.2 5.7 2.3 virginica 109 6.7 2.5 5.8 1.8 virginica 135 6.1 2.6 5.6 1.4 virginica 129 6.4 2.8 5.6 2.1 virginica 138 6.4 3.1 5.5 1.8 virginica 125 6.7 3.3 5.7 2.1 virginica 114 5.7 2.5 5.0 2.0 virginica 117 6.5 3.0 5.5 1.8 virginica 112 6.4 2.7 5.3 1.9 virginica 104 6.3 2.9 5.6 1.8 virginica [[3]][[2]] Sepal.Length Sepal.Width Petal.Length Petal.Width Species 115 5.8 2.8 5.1 2.4 virginica 123 7.7 2.8 6.7 2.0 virginica 111 6.5 3.2 5.1 2.0 virginica 116 6.4 3.2 5.3 2.3 virginica 131 7.4 2.8 6.1 1.9 virginica 101 6.3 3.3 6.0 2.5 virginica 113 6.8 3.0 5.5 2.1 virginica 120 6.0 2.2 5.0 1.5 virginica 145 6.7 3.3 5.7 2.5 virginica 144 6.8 3.2 5.9 2.3 virginica [[3]][[3]] Sepal.Length Sepal.Width Petal.Length Petal.Width Species 143 5.8 2.7 5.1 1.9 virginica 127 6.2 2.8 4.8 1.8 virginica 139 6.0 3.0 4.8 1.8 virginica 128 6.1 3.0 4.9 1.8 virginica 119 7.7 2.6 6.9 2.3 virginica 122 5.6 2.8 4.9 2.0 virginica 134 6.3 2.8 5.1 1.5 virginica 108 7.3 2.9 6.3 1.8 virginica 130 7.2 3.0 5.8 1.6 virginica 140 6.9 3.1 5.4 2.1 virginica [[3]][[4]] Sepal.Length Sepal.Width Petal.Length Petal.Width Species 146 6.7 3.0 5.2 2.3 virginica 126 7.2 3.2 6.0 1.8 virginica 110 7.2 3.6 6.1 2.5 virginica 124 6.3 2.7 4.9 1.8 virginica 147 6.3 2.5 5.0 1.9 virginica 105 6.5 3.0 5.8 2.2 virginica 137 6.3 3.4 5.6 2.4 virginica 118 7.7 3.8 6.7 2.2 virginica 132 7.9 3.8 6.4 2.0 virginica 148 6.5 3.0 5.2 2.0 virginica [[3]][[5]] Sepal.Length Sepal.Width Petal.Length Petal.Width Species 136 7.7 3.0 6.1 2.3 virginica 133 6.4 2.8 5.6 2.2 virginica 149 6.2 3.4 5.4 2.3 virginica 142 6.9 3.1 5.1 2.3 virginica 141 6.7 3.1 5.6 2.4 virginica 103 7.1 3.0 5.9 2.1 virginica 102 5.8 2.7 5.1 1.9 virginica 107 4.9 2.5 4.5 1.7 virginica 150 5.9 3.0 5.1 1.8 virginica 106 7.6 3.0 6.6 2.1 virginica
R語言——K折交叉驗證之隨機均分數據集