1. 程式人生 > >用R生成隨機數

用R生成隨機數

1. 隨機數生成

  作為一種統計分析語言,R是一個生成各種統計分佈功能隨機數的綜合性圖書館。在這篇文章中,我想專注於這個簡單的問題:我如何生成一個隨機數?

  答案取決於你想要什麼樣的隨機數生成?讓我們通過例子說明。

  a) 生成 5.0 和 7.5 之間的隨機數

  如果你想生成一個十進位制數規定的最低和最高之間的任何值(包括分數值)同樣是可能的,使用runif()功能。 這個函式生成均勻分佈的值。 這裡是如何生成一個 5.0 和 7.5 之間的隨機數的方法:

    > x1 <- runif(1, 5.0, 7.5) 
    > x1 
    [1] 5.573882

  當然,當你執行這個,你會得到一個不同的數字,但它一定會在 5.0 和 7.5 之間。 你不會得到準確值 5.0 或 7.5 。

  如果你想生成多個隨機的值,不要使用一個迴圈。您可以生成多個值一次通過指定您要作為第一個引數runif()值的數目。這裡是如何產生10個 5.0 和 7.5 之間的值:

    > x2 <- runif(10, 5.0, 7.5) 
    > x2 
     [1] 6.871893 6.518833 5.372662 6.808643 5.490282 7.381824
     [7] 7.476985 5.569176 6.182591 5.285149

  b) 生成 1 到 10 之間的隨機整數

  這看起來像最後一個相同的運動,但現在我們只希望完整的數字,而不是分數值。 為此,我們使用的示例函式:

    > x3 <- sample(1:10, 1)
    > x3
    [1] 3

  c) 各種分佈的隨機數生存函式

    rnorm(n, mean=0, sd=1)   # 正態分佈
    rexp(n, rate=1)   # 指數
    rgamma(n, shape, rate=1, scale=1/rate)   # r 分佈
    rpois(n, lambda)   # 泊松
    rt(n, df, ncp)   # t 分佈
rf(n, df1, df2, ncp) # f 分佈 rchisq(n, df, ncp=0) # 卡方分佈 rbinom(n, size, prob) # 二項分佈 rweibull(n, shape, scale=1) # weibull 分佈 rbata(n, shape1, shape2) # bata 分佈

2. 抽樣模擬

  a) 在 1 到 10 之間隨機抽取 1 個數

    > x1 <- sample(1:10, 1)
    > x1
    [1] 3

  第一個引數是一個有效的數字向量生成(這裡的數字1到10),第二個引數表示應返回一個數字。

  b) 在 “一組數” 之間隨機抽取 多 個數

  如果我們要生成多個隨機數,我們必須增加一個額外的引數,表示允許重複:

    # 有放回抽取
    > x2 <-sample(1:10, 5, replace=T)
    > x2
    [1] 1 5 9 9 8

  如果我們要生成多個隨機數,我們必須增加一個額外的引數,表示不允許重複:

    # 無放回抽取
    > x3 <-sample(1:40, 6, replace=F)
    > x3
    [1]  3 40 20 19 28 11

  c) 對一組資料進行亂序排序(隨機抽取,順序隨機)

  你可以使用同樣的想法產生的任何載體的隨機子集,甚至不包含數字。 例如,選擇10個不同的美國各州隨機:

    > sample(state.name, 10)
     [1] "Delaware"     "Vermont"      "Rhode Island" "Tennessee"   
     [5] "Arizona"      "Mississippi"  "Virginia"     "Alaska"      
     [9] "Georgia"      "Louisiana"  

    > sample(state.name, 52)
    Error in sample.int(length(x), size, replace, prob) : 
      cannot take a sample larger than the population when 'replace = FALSE'

    > length(state.name)
    [1] 50
    
    > sample(state.name, 50)
     [1] "Oregon"         "Georgia"        "Maine"         
     [4] "Idaho"          "Alaska"         "Tennessee"     
     [7] "Indiana"        "Wyoming"        "Montana"       
    [10] "Utah"           "Florida"        "North Carolina"
    [13] "Nevada"         "Virginia"       "Pennsylvania"  
    [16] "North Dakota"   "Wisconsin"      "Alabama"       
    [19] "Mississippi"    "New Hampshire"  "Delaware"      
    [22] "Arizona"        "Massachusetts"  "Vermont"       
    [25] "Maryland"       "Missouri"       "Michigan"      
    [28] "Connecticut"    "Colorado"       "New York"      
    [31] "Oklahoma"       "California"     "Washington"    
    [34] "South Carolina" "West Virginia"  "Hawaii"        
    [37] "Illinois"       "Arkansas"       "Kentucky"      
    [40] "Iowa"           "Kansas"         "Rhode Island"  
    [43] "New Mexico"     "South Dakota"   "New Jersey"    
    [46] "Nebraska"       "Ohio"           "Texas"         
    [49] "Louisiana"      "Minnesota"     
    
    > sample(state.name, 50)
     [1] "Arkansas"       "California"     "Texas"         
     [4] "Tennessee"      "Montana"        "Massachusetts" 
     [7] "North Dakota"   "Oregon"         "Delaware"      
    [10] "Hawaii"         "South Dakota"   "Connecticut"   
    [13] "South Carolina" "Kansas"         "Washington"    
    [16] "West Virginia"  "Georgia"        "Maine"         
    [19] "Wyoming"        "Illinois"       "Nebraska"      
    [22] "Idaho"          "Maryland"       "Ohio"          
    [25] "Indiana"        "Louisiana"      "Mississippi"   
    [28] "Iowa"           "New York"       "Minnesota"     
    [31] "Rhode Island"   "New Mexico"     "New Jersey"    
    [34] "Pennsylvania"   "Kentucky"       "Utah"          
    [37] "Nevada"         "Arizona"        "Missouri"      
    [40] "North Carolina" "New Hampshire"  "Wisconsin"     
    [43] "Alaska"         "Vermont"        "Alabama"       
    [46] "Florida"        "Oklahoma"       "Michigan"      
    [49] "Colorado"       "Virginia"      
    > 

  d) 隨機從矩陣(資料框)中選取一部分物件

    -----------------------------------------------------   
    (col.name=colnames(mtcars))
    (row.name=rownames(mtcars))
    #列名向量不返回抽樣
    (sam.col.name=sample(col.name,10,replace=FALSE))
    #行名向量不返回抽樣
    (sam.row.name=sample(row.name,10,replace=FALSE))
    B=mtcars[sam.row.name,sam.col.name]
    B
    -----------------------------------------------------
    > (col.name=colnames(mtcars))
     [1] "mpg"  "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"  
    [10] "gear" "carb"
    > (row.name=rownames(mtcars))
     [1] "Mazda RX4"           "Mazda RX4 Wag"      
     [3] "Datsun 710"          "Hornet 4 Drive"     
     [5] "Hornet Sportabout"   "Valiant"            
     [7] "Duster 360"          "Merc 240D"          
     [9] "Merc 230"            "Merc 280"           
    [11] "Merc 280C"           "Merc 450SE"         
    [13] "Merc 450SL"          "Merc 450SLC"        
    [15] "Cadillac Fleetwood"  "Lincoln Continental"
    [17] "Chrysler Imperial"   "Fiat 128"           
    [19] "Honda Civic"         "Toyota Corolla"     
    [21] "Toyota Corona"       "Dodge Challenger"   
    [23] "AMC Javelin"         "Camaro Z28"         
    [25] "Pontiac Firebird"    "Fiat X1-9"          
    [27] "Porsche 914-2"       "Lotus Europa"       
    [29] "Ford Pantera L"      "Ferrari Dino"       
    [31] "Maserati Bora"       "Volvo 142E"         
    > #列名向量不返回抽樣
    > (sam.col.name=sample(col.name,10,replace=FALSE))
     [1] "mpg"  "qsec" "wt"   "disp" "carb" "cyl"  "vs"   "am"   "hp"  
    [10] "gear"
    > #行名向量不返回抽樣
    > (sam.row.name=sample(row.name,10,replace=FALSE))
     [1] "Merc 230"          "Pontiac Firebird"  "Maserati Bora"    
     [4] "Hornet Sportabout" "Merc 450SE"        "Merc 280"         
     [7] "Duster 360"        "Merc 280C"         "Dodge Challenger" 
    [10] "Camaro Z28"       
    > B=mtcars[sam.row.name,sam.col.name]
    > B
                       mpg  qsec    wt  disp carb cyl vs am  hp gear
    Merc 230          22.8 22.90 3.150 140.8    2   4  1  0  95    4
    Pontiac Firebird  19.2 17.05 3.845 400.0    2   8  0  0 175    3
    Maserati Bora     15.0 14.60 3.570 301.0    8   8  0  1 335    5
    Hornet Sportabout 18.7 17.02 3.440 360.0    2   8  0  0 175    3
    Merc 450SE        16.4 17.40 4.070 275.8    3   8  0  0 180    3
    Merc 280          19.2 18.30 3.440 167.6    4   6  1  0 123    4
    Duster 360        14.3 15.84 3.570 360.0    4   8  0  0 245    3
    Merc 280C         17.8 18.90 3.440 167.6    4   6  1  0 123    4
    Dodge Challenger  15.5 16.87 3.520 318.0    2   8  0  0 150    3
    Camaro Z28        13.3 15.41 3.840 350.0    4   8  0  0 245    3
    > 
    -----------------------------------------------------

  e) 有放回的隨機抽樣示例

  在e裡隨機有放回地取N數,形成一個向量

    e <-c(1:6)
    N <-100s <-sample(x=e,size=N,replace=TRUE)table(s)

    > e <-c(1:6)
    > N <-100
    > s <-sample(x=e,size=N,replace=TRUE)
    > table(s)
    s
     1  2  3  4  5  6 
    16 19 14 21 19 11 

  f) 用sample函式實現隨機排序(抽樣)

  sample函式第一個引數x表示被抽樣的向量,第二個引數size表示抽取的樣本個數,第三個引數replace表示是不是有放回的抽樣。

    -----------------------------------------------------
    Vec <- 1:10
    # 無放回抽樣
    sample(x = Vec, size = 5)
    sample(x = Vec, size = 5)
    # 只要第二個引數size的數值和第一個引數x的長度相同就可以實現隨機排序。
    sample(x = Vec, size = 10)
    sample(x = Vec, size = 10)
    # 有放回抽樣
    sample(x = Vec, size = 10, replace = TRUE)
    sample(x = Vec, size = 10, replace = TRUE)
    -----------------------------------------------------
    > Vec <- 1:10
    > # 無放回抽樣
    > sample(x = Vec, size = 5)
    [1] 9 3 1 5 4
    > sample(x = Vec, size = 5)
    [1]  3  1  4  9 10
    > # 只要第二個引數size的數值和第一個引數x的長度相同就可以實現隨機排序。
    > sample(x = Vec, size = 10)
     [1]  1  6  2 10  7  8  3  9  5  4
    > sample(x = Vec, size = 10)
     [1]  6  9 10  7  1  5  8  3  4  2
    > # 有放回抽樣
    > sample(x = Vec, size = 10, replace = TRUE)
     [1]  3  6  2  2  8  8  3  6 10  8
    > sample(x = Vec, size = 10, replace = TRUE)
     [1]  7  5 10  5  7  3  1  6  4  7
    -----------------------------------------------------

3. 設定隨機數種子

    > set.seed(1234)
    > x <- rnorm(10)
    > x
     [1] -1.2070657  0.2774292  1.0844412 -2.3456977  0.4291247
     [6]  0.5060559 -0.5747400 -0.5466319 -0.5644520 -0.8900378
    > 
    > # 設定了隨機數種子後,每次產生的分佈數都是相同的
    > set.seed(1234)
    > x <- rnorm(10)
    > x
     [1]