R 語言kmeans聚類例項
阿新 • • 發佈:2019-02-10
kmeans數學基礎
以後補充
樣本資料:
程式碼:
setwd("/users/XXX/desktop/R/chapter5/示例程式")
myData<-read.csv("consumption_data.csv")[,2:4]
head(myData)
#求有多少條資料
length(myData$F)
#center=3是指聚類個數k=3,簡單說就是分三類
km<-kmeans(myData,center=3)
print(km)
#這說明了總計940條資料,分成三類的各自的數量
K-means clustering with 3 clusters of sizes 218 , 370, 352
#三類的平均值
Cluster means:
R F M
1 16.09174 10.711009 1913.3965
2 15.48919 7.316216 429.8898
3 18.47727 11.355114 1198.3034
#每條資料屬於第幾類的
Clustering vector:
[1] 2 3 3 2 1 2 2 3 2 3 2 2 1 1 1 1 3 3 2 2 2 1 3 2 2 1 1 1 2 1 3 2 2 3 3 2 2
[38] 3 1 2 2 1 2 3 2 3 3 3 1 1 1 1 1 3 2 3 3 1 3 2 3 3 2 3 1 2 1 3 2 1 2 3 3 2
[75] 1 2 3 2 1 1 2 2 3 2 2 3 3 2 2 3 3 2 2 3 3 1 2 2 1 2 2 3 1 3 2 2 2 3 3 2 3
[112] 3 1 2 2 2 2 2 3 3 2 1 2 2 2 2 2 3 2 3 1 2 2 2 3 3 2 3 1 2 1 3 2 3 2 2 2 2
[149] 1 1 1 3 3 3 2 2 1 2 1 1 3 3 2 2 2 2 2 3 3 3 3 1 3 1 2 1 2 2 2 2 2 2 3 2 3
[186] 2 3 3 3 3 2 3 3 3 3 1 3 1 2 2 2 3 2 3 3 1 3 2 3 3 2 2 3 3 2 2 2 3 3 3 1 3
[223] 3 2 3 1 3 3 2 2 2 2 1 3 1 2 2 3 3 2 2 3 3 2 2 1 2 3 2 2 3 1 1 2 3 3 2 3 1
[260] 3 3 3 3 1 1 2 1 2 2 1 3 2 1 3 1 2 1 3 2 1 3 3 3 3 2 1 2 3 3 2 3 2 2 2 3 1
[297] 2 3 2 1 3 3 2 3 1 3 2 3 3 3 2 3 2 3 2 3 3 3 3 3 3 2 3 1 2 1 2 3 3 3 2 1 3
[334] 1 2 1 3 2 1 1 2 2 3 3 3 2 2 3 2 1 2 2 2 3 2 3 2 3 3 2 3 2 1 3 3 2 2 3 3 2
[371] 2 3 3 3 3 2 2 2 3 1 2 2 2 3 2 3 3 3 3 2 2 2 1 1 2 3 3 1 1 3 1 2 3 2 3 2 3
[408] 3 1 1 2 1 3 3 3 1 3 2 3 3 3 1 2 3 3 2 3 1 3 3 3 2 1 3 3 1 2 3 1 3 1 3 2 3
[445] 2 3 2 2 1 2 3 2 1 3 3 1 2 3 1 2 2 3 2 2 2 3 1 1 2 3 3 3 2 2 3 1 3 3 1 3 1
[482] 1 1 1 2 2 1 3 2 3 2 1 2 3 2 2 1 3 1 1 2 1 3 3 2 3 3 1 2 3 1 1 3 1 3 1 2 3
[519] 2 3 1 2 3 2 1 1 2 1 2 2 3 2 2 1 2 1 2 3 3 3 1 3 1 2 3 1 1 2 2 1 2 2 3 3 3
[556] 1 2 3 1 3 1 3 2 1 1 2 2 1 1 3 1 2 2 1 3 3 3 2 1 2 3 1 2 2 2 2 2 3 2 2 3 3
[593] 2 3 3 2 2 3 2 3 3 2 2 2 1 3 3 2 3 3 2 1 2 2 2 2 2 2 3 3 1 2 3 2 2 2 1 3 2
[630] 3 3 2 1 2 2 3 3 1 1 2 3 1 3 3 2 2 1 2 3 3 3 3 3 2 2 2 2 2 1 3 3 1 2 3 3 3
[667] 1 1 1 3 2 3 3 1 1 1 1 3 1 2 3 2 1 2 1 2 2 3 2 3 3 1 3 3 1 3 1 1 3 3 3 3 1
[704] 2 1 1 2 3 2 2 3 3 2 2 3 2 2 2 3 1 1 1 2 2 2 2 2 1 3 1 2 2 3 3 3 2 2 2 2 2
[741] 2 1 1 2 2 2 2 3 1 2 2 2 1 3 1 2 3 3 1 3 2 3 3 2 3 3 2 3 1 1 2 1 2 3 2 2 2
[778] 3 2 3 2 3 1 1 2 2 1 1 2 1 2 3 3 2 1 2 2 2 3 3 3 2 2 1 1 3 3 2 3 3 2 2 2 2
[815] 2 2 1 2 3 2 2 2 3 2 3 2 2 3 2 2 3 2 3 2 1 2 2 3 3 3 3 1 2 3 1 2 2 1 1 3 2
[852] 2 3 2 2 3 1 1 1 1 3 3 2 2 1 2 3 3 3 2 3 1 2 2 3 1 1 2 1 2 1 3 2 3 3 2 1 3
[889] 3 2 3 1 3 2 3 1 3 2 2 1 2 2 1 3 2 1 1 3 2 3 1 3 2 1 1 1 3 2 2 3 3 3 3 2 1
[926] 3 3 2 1 3 1 1 1 1 1 3 3 3 3 2
Within cluster sum of squares by cluster:
[1] 133181360 18978771 16383138
(between_SS / total_SS = 65.0 %)
Available components:
[1] "cluster" "centers" "totss" "withinss" "tot.withinss"
[6] "betweenss" "size" "iter" "ifault"
#每一類所佔總樣本的百分比
km$size/sum(km$size)
#資料分組 km$cluster每條資料所屬類別向量組,就是向原資料集添加了一列所屬分類的資料
aaa<-data.frame(myData,km$cluster)
#很奇怪這裡為什麼不用aaa
Data1<-myData[which(aaa$km.cluster==1),]
Data2<-myData[which(aaa$km.cluster==2),]
Data3<-myData[which(aaa$km.cluster==3),]
#分群“1”的概率密度函式圖 par()多個圖顯示在一個圖中的函式
png("kmean.png")
par(mfrow=c(3,3))
plot(density(Data1[,1]),col="red",main="R")
plot(density(Data1[,2]),col="red",main="F")
plot(density(Data1[,3]),col="red",main="M")
plot(density(Data2[,1]),col="red",main="R")
plot(density(Data2[,2]),col="red",main="F")
plot(density(Data2[,3]),col="red",main="M")
plot(density(Data3[,1]),col="red",main="R")
plot(density(Data3[,2]),col="red",main="F")
plot(density(Data3[,3]),col="red",main="M")
dev.off()