R語言中的資料集

阿新 • • 發佈：2019-01-01

R語言中的資料儲存形式主要有以下幾種方式

陣列，向量，矩陣，資料框，列表

R語言中的可以處理的資料型別有以下幾種方式

數值型別，字元型別，邏輯型別，原聲型別（二進位制型別），複數型別

數值型別包括例項標示，日期型別

字元型別包括標稱變數，序數變數

R語言針對不同的資料型別處理的方式是不同的

一、向量（每一個向量中的元素都是相同的資料型別）

a <- c(1, 2, 5, 3, 6, -2, 4) 整數型別
b <- c("one", "two", "three") 字元型別
c <- c(TRUE, TRUE, TRUE, FALSE, TRUE, FALSE) boolean型別

向量的下標從1開始

以下是幾種對於向量的操作

a[1] 訪問第一個元素 a[c（2,4）]訪問向量a的第2個和第4個元素

a[3]
a[c(1, 3, 5)]
a[2:6]

二、矩陣（是一個二維的資料，元素都具有相同的資料型別）

以下是幾個簡單建立矩陣的例子

y <- matrix(1:20, nrow = 5, ncol = 4)

y顯示的結果如下

[,1] [,2] [,3] [,4]
[1,] 1 6 11 16
[2,] 2 7 12 17
[3,] 3 8 13 18
[4,] 4 9 14 19
[5,] 5 10 15 20

cells <- c(1, 26, 24, 68)
rnames <- c("R1", "R2")
cnames <- c("C1", "C2")
mymatrix <- matrix(cells, nrow = 2, ncol = 2, byrow = TRUE,
dimnames = list(rnames, cnames))

結果如下

C1 C2
R1 1 26
R2 24 68

mymatrix <- matrix(cells, nrow = 2, ncol = 2, byrow = FALSE,
dimnames = list(rnames, cnames))

結果如下：

C1 C2
R1 1 24
R2 26 68

如何訪問矩陣中的元素

x <- matrix(1:10, nrow = 2)
x
x[2, ]
x[, 2]
x[1, 4]
x[1, c(4, 5)]

三、陣列（三維以上的資料用陣列，陣列中的元素是相同的資料型別）

建立資料的基本的原型函式

myarray <- array(vector,dimensions,dimnames) // 資料來源，向量表示維度下標的最大值，名稱

dim1 <- c("A1", "A2")
dim2 <- c("B1", "B2", "B3")
dim3 <- c("C1", "C2", "C3", "C4")
z <- array(1:24, c(2, 3, 4), dimnames = list(dim1,
dim2, dim3))
z

四、資料框（dataframe，可以有不同的資料型別）

建立資料框的一般的函式

mydata <- data.frame(col1,col2,col3,col4,...，row.names=col1)

我們將病人的資料，以資料框的形式載入到資料框中

patientID <- c(1, 2, 3, 4)
age <- c(25, 34, 28, 52)
diabetes <- c("Type1", "Type2", "Type1", "Type1")
status <- c("Poor", "Improved", "Excellent", "Poor")
patientdata <- data.frame(patientID, age, diabetes,
status)
patientdata

結果如下

patientID age diabetes status
1 1 25 Type1 Poor
2 2 34 Type2 Improved
3 3 28 Type1 Excellent
4 4 52 Type1 Poor

資料框如何訪問自己的列

patientdata[1:2]
patientdata[c("diabetes", "status")]
patientdata$age

可以使用attach（資料框）detach（資料框）來減少資料框$的使用

例如

summary（mtcars$mpg）

plot(mtcars$mpg,mtcars$wt)

另外一種方式就是

attach(mtcars)

summary(mpg)

plot(mpg,wt)

detach(mtcars)

或者使用with方式

with（mtcars,{

summary(mpg)

plot(mpg,wt)

}）

{}中的都是針對mtcars資料集的操作

在病人資料中的Diabetes和Status是標稱屬性和序數屬性，都是字元型別的，可以使用factor（col）將字元型別變成數值型別

col <- factor(col) //將標稱型別變成數值型別，沒有順序

col <- factor(col,order=true) //將序數型別變成數值型別，有順序，按照值得字母的順序排序

col <- factor(col,order=true,levels=c("","",""))

patientID <- c(1, 2, 3, 4)
age <- c(25, 34, 28, 52)
diabetes <- c("Type1", "Type2", "Type1", "Type1")
status <- c("Poor", "Improved", "Excellent", "Poor")
diabetes <- factor(diabetes)
status <- factor(status, order = TRUE)
patientdata <- data.frame(patientID, age, diabetes,
status)
str(patientdata)
summary(patientdata)

五、列表（list，列表中的元素可以是向量，矩陣，陣列，資料框，列表）

mylist<- list(object1,object2,object3)

g <- "My First List"
h <- c(25, 26, 18, 39)
j <- matrix(1:10, nrow = 5)
k <- c("one", "two", "three")
mylist <- list(title = g, ages = h, j, k)
mylist

mylist[[3]][,1]

如何對以上5中物件處理

R語言中的資料集

R語言︱大資料集下執行記憶體管理

R語言中的資料集

在R語言中建立、使用資料框

R語言將資料框中的字元型別數字轉換為數值

R語言中的資料結構

R語言中的資料探勘演算法

R語言中的資料篩選索引

R語言合併資料框中相同的列元素

R語言中的資料框合併

R語言中的列表和資料框

R語言中刪除重複的資料行

R語言中向量&矩陣&陣列&資料框&列表的區別與聯絡

Mac版R語言入門（五）R語言中的資料型別之factor因子

R語言與資料模型(1)-平均,方差,中位數，分位數，極差

【R語言入門】R語言中的變數與基本資料型別

R語言中如何使用最小二乘法

【譯文】怎樣在R語言中使用SQL命令

R語言中常用包（二）

R語言與.net 集成開發入門

r語言中如何進行兩組獨立樣本秩和檢驗

R語言中的資料集

相關推薦