1. 程式人生 > >R語言實戰 - 基本統計分析(1)- 描述性統計分析

R語言實戰 - 基本統計分析(1)- 描述性統計分析

4.3 summary eas 方法 func -- 4.4 1.0 6.5

> vars <- c("mpg", "hp", "wt")
> head(mtcars[vars])
                   mpg  hp    wt
Mazda RX4         21.0 110 2.620
Mazda RX4 Wag     21.0 110 2.875
Datsun 710        22.8  93 2.320
Hornet 4 Drive    21.4 110 3.215
Hornet Sportabout 18.7 175 3.440
Valiant           18.1 105 3.460
> 

1. 方法雲集

> summary(mtcars[vars])
      mpg              hp              wt       
 Min.   :10.40   Min.   : 52.0   Min.   :1.513  
 1st Qu.:15.43   1st Qu.: 96.5   1st Qu.:2.581  
 Median :19.20   Median :123.0   Median :3.325  
 Mean   :20.09   Mean   :146.7   Mean   :3.217  
 3rd Qu.:22.80   3rd Qu.:180.0   3rd Qu.:3.610  
 Max.   :33.90   Max.   :335.0   Max.   :5.424 
> mystats <- function(x, na.omit=FALSE){
+     if (na.omit)
+         x <- x[!is.na(x)]
+     m <- mean(x)
+     n <- length(x)
+     s <- sd(x)
+     skew <- sum((x-m)^3/s^3)/n
+     kurt <- sum((x-m)^4/s^4)/n-3
+     return(c(n=n, mean=m, stdev=s, skew=skew, kurtosis=kurt))
+ }
> sapply(mtcars[vars], mystats)
               mpg          hp          wt
n        32.000000  32.0000000 32.00000000
mean     20.090625 146.6875000  3.21725000
stdev     6.026948  68.5628685  0.97845744
skew      0.610655   0.7260237  0.42314646
kurtosis -0.372766  -0.1355511 -0.02271075
> 

mpg平均值20.1,標準偏差6.0. 分布呈現右偏(偏度0.6),較正態分布稍平(峰度-0.37)

Hmisc 包安裝失敗

1)通過Hmisc包中的describe()函數計算描述性統計量

2)通過pastecs包中的stat.desc()函數計算描述性統計量

> vars <- c("mpg", "hp", "wt")
> library(pastecs)
> stat.desc(mtcars[vars])
                     mpg           hp          wt
nbr.val       32.0000000   32.0000000  32.0000000
nbr.null       0.0000000    0.0000000   0.0000000
nbr.na         0.0000000    0.0000000   0.0000000
min           10.4000000   52.0000000   1.5130000
max           33.9000000  335.0000000   5.4240000
range         23.5000000  283.0000000   3.9110000
sum          642.9000000 4694.0000000 102.9520000
median        19.2000000  123.0000000   3.3250000
mean          20.0906250  146.6875000   3.2172500
SE.mean        1.0654240   12.1203173   0.1729685
CI.mean.0.95   2.1729465   24.7195501   0.3527715
var           36.3241028 4700.8669355   0.9573790
std.dev        6.0269481   68.5628685   0.9784574
coef.var       0.2999881    0.4674077   0.3041285

psych包中describe()函數計算 非缺失值的數量、平均數、標準差、中位數、截尾均值、絕對中位差、最小值、最大值、值域、偏度、峰度和平均值的標準誤。

3)通過psych包中的describe()函數計算描述性統計量

> library(psych)
> describe(mtcars[vars])
    vars  n   mean    sd median trimmed   mad   min    max  range skew kurtosis    se
mpg    1 32  20.09  6.03  19.20   19.70  5.41 10.40  33.90  23.50 0.61    -0.37  1.07
hp     2 32 146.69 68.56 123.00  141.19 77.10 52.00 335.00 283.00 0.73    -0.14 12.12
wt     3 32   3.22  0.98   3.33    3.15  0.77  1.51   5.42   3.91 0.42    -0.02  0.17

2. 分組計算描述性統計量

> aggregate(mtcars[vars], by=list(am=mtcars$am), mean)
  am      mpg       hp       wt
1  0 17.14737 160.2632 3.768895
2  1 24.39231 126.8462 2.411000
> aggregate(mtcars[vars], by=list(am=mtcars$am), sd)
  am      mpg       hp        wt
1  0 3.833966 53.90820 0.7774001
2  1 6.166504 84.06232 0.6169816
> 

  

使用by()分組計算描述性統計量(失敗)

doBy包安裝失敗

使用psych包中的describe.by()分組計算概述統計量

> library(psych)
> describe.by(mtcars[vars], mtcars$am)

 Descriptive statistics by group 
group: 0
    vars  n   mean    sd median trimmed   mad   min    max  range  skew kurtosis    se
mpg    1 19  17.15  3.83  17.30   17.12  3.11 10.40  24.40  14.00  0.01    -0.80  0.88
hp     2 19 160.26 53.91 175.00  161.06 77.10 62.00 245.00 183.00 -0.01    -1.21 12.37
wt     3 19   3.77  0.78   3.52    3.75  0.45  2.46   5.42   2.96  0.98     0.14  0.18
-------------------------------------------------------------------- 
group: 1
    vars  n   mean    sd median trimmed   mad   min    max  range skew kurtosis    se
mpg    1 13  24.39  6.17  22.80   24.38  6.67 15.00  33.90  18.90 0.05    -1.46  1.71
hp     2 13 126.85 84.06 109.00  114.73 63.75 52.00 335.00 283.00 1.36     0.56 23.31
wt     3 13   2.41  0.62   2.32    2.39  0.68  1.51   3.57   2.06 0.21    -1.17  0.17
Warning message:
describe.by is deprecated.  Please use the describeBy function 
> 

 

通過reshape包分組計算概述統計量

> library(reshape)
> dstats <- function(x)(c(n=length(x), mean=mean(x), sd=sd(x)))
> dfm <- melt(mtcars, measure.vars=("mpg","hp","wt"), id.vars=c("am","cyl"))
Error: unexpected ‘,‘ in "dfm <- melt(mtcars, measure.vars=("mpg","
> dfm <- melt(mtcars, measure.vars=c("mpg","hp","wt"), id.vars=c("am","cyl"))
> cast(dfm, am+cyl+variable~.,dstats)
   am cyl variable  n       mean         sd
1   0   4      mpg  3  22.900000  1.4525839
2   0   4       hp  3  84.666667 19.6553640
3   0   4       wt  3   2.935000  0.4075230
4   0   6      mpg  4  19.125000  1.6317169
5   0   6       hp  4 115.250000  9.1787799
6   0   6       wt  4   3.388750  0.1162164
7   0   8      mpg 12  15.050000  2.7743959
8   0   8       hp 12 194.166667 33.3598379
9   0   8       wt 12   4.104083  0.7683069
10  1   4      mpg  8  28.075000  4.4838599
11  1   4       hp  8  81.875000 22.6554156
12  1   4       wt  8   2.042250  0.4093485
13  1   6      mpg  3  20.566667  0.7505553
14  1   6       hp  3 131.666667 37.5277675
15  1   6       wt  3   2.755000  0.1281601
16  1   8      mpg  2  15.400000  0.5656854
17  1   8       hp  2 299.500000 50.2045815
18  1   8       wt  2   3.370000  0.2828427

  

3. 結果的可視化

R語言實戰 - 基本統計分析(1)- 描述性統計分析