1. 程式人生 > >ggplot2 提取stat計算出來的數據

ggplot2 提取stat計算出來的數據

1.8 xtra transform xlabel oba 技術 fill 4.5 mit

使用ggplot2 繪圖時,我們只需要提供原始數據就可以了,ggplot2 內置了許多的計算函數,來幫助我們計算對應的數值。

最典型的的,當使用geom_boxplot 繪制箱線圖時,我們只提供原始數據,用來繪圖的最大值,最小值,中位數,上下四分位數都由ggplot2 自動計算。

那麽我們如何提取這部分計算好的數據呢,以箱線圖為例進行說明

繪圖代碼如下:

pdf("a.pdf")
p      <- ggplot(mpg, aes(class, hwy)) + geom_boxplot()
temp <- print(p)
dev.off()

生成的圖片如下:

技術分享

在temp 這個對象中,就保存了計算好的用於繪制箱線圖的數據

讓我們來看下temp這個對象的結構

>str(temp)
List of 3
 $ data  :List of 1
  ..$ :‘data.frame‘:	7 obs. of  22 variables:
  .. ..$ ymin      : num [1:7] 23 23 23 21 15 20 14
  .. ..$ lower     : num [1:7] 24 26 26 22 16 24.5 17
  .. ..$ middle    : num [1:7] 25 27 27 23 17 26 17.5
  .. ..$ upper     : num [1:7] 26 29 29 24 18 30.5 19
  .. ..$ ymax      : num [1:7] 26 33 32 24 20 36 22
  .. ..$ outliers  :List of 7
  .. .. ..$ : num(0) 
  .. .. ..$ : num [1:4] 35 37 35 44
  .. .. ..$ : num(0) 
  .. .. ..$ : num 17
  .. .. ..$ : num [1:4] 12 12 12 22
  .. .. ..$ : num [1:2] 44 41
  .. .. ..$ : num [1:8] 12 12 25 24 27 25 26 23
  .. ..$ notchupper: num [1:7] 26.4 27.7 27.7 24 17.6 ...
  .. ..$ notchlower: num [1:7] 23.6 26.3 26.3 22 16.4 ...
  .. ..$ x         : num [1:7] 1 2 3 4 5 6 7
  .. ..$ PANEL     : int [1:7] 1 1 1 1 1 1 1
  .. ..$ group     : int [1:7] 1 2 3 4 5 6 7
  .. ..$ ymin_final: num [1:7] 23 23 23 17 12 20 12
  .. ..$ ymax_final: num [1:7] 26 44 32 24 22 44 27
  .. ..$ xmin      : num [1:7] 0.625 1.625 2.625 3.625 4.625 ...
  .. ..$ xmax      : num [1:7] 1.38 2.38 3.38 4.38 5.38 ...
  .. ..$ weight    : num [1:7] 1 1 1 1 1 1 1
  .. ..$ colour    : chr [1:7] "grey20" "grey20" "grey20" "grey20" ...
  .. ..$ fill      : chr [1:7] "white" "white" "white" "white" ...
  .. ..$ size      : num [1:7] 0.5 0.5 0.5 0.5 0.5 0.5 0.5
  .. ..$ alpha     : logi [1:7] NA NA NA NA NA NA ...
  .. ..$ shape     : num [1:7] 19 19 19 19 19 19 19
  .. ..$ linetype  : chr [1:7] "solid" "solid" "solid" "solid" ...
 $ layout:Classes ‘Layout‘, ‘ggproto‘ <ggproto object: Class Layout>
    facet: <ggproto object: Class FacetNull, Facet>
        compute_layout: function
        draw_back: function
        draw_front: function
        draw_labels: function
        draw_panels: function
        finish_data: function
        init_scales: function
        map: function
        map_data: function
        params: list
        render_back: function
        render_front: function
        render_panels: function
        setup_data: function
        setup_params: function
        shrink: TRUE
        train: function
        train_positions: function
        train_scales: function
        vars: function
        super:  <ggproto object: Class FacetNull, Facet>
    finish_data: function
    get_scales: function
    map: function
    map_position: function
    panel_layout: data.frame
    panel_ranges: list
    panel_scales: list
    render: function
    render_labels: function
    reset_scales: function
    setup: function
    train_position: function
    train_ranges: function
    xlabel: function
    ylabel: function
    super:  <ggproto object: Class Layout> 
 $ plot  :List of 9
  ..$ data       :Classes ‘tbl_df’, ‘tbl’ and ‘data.frame‘:	234 obs. of  11 variables:
  .. ..$ manufacturer: chr [1:234] "audi" "audi" "audi" "audi" ...
  .. ..$ model       : chr [1:234] "a4" "a4" "a4" "a4" ...
  .. ..$ displ       : num [1:234] 1.8 1.8 2 2 2.8 2.8 3.1 1.8 1.8 2 ...
  .. ..$ year        : int [1:234] 1999 1999 2008 2008 1999 1999 2008 1999 1999 2008 ...
  .. ..$ cyl         : int [1:234] 4 4 4 4 6 6 6 4 4 4 ...
  .. ..$ trans       : chr [1:234] "auto(l5)" "manual(m5)" "manual(m6)" "auto(av)" ...
  .. ..$ drv         : chr [1:234] "f" "f" "f" "f" ...
  .. ..$ cty         : int [1:234] 18 21 20 21 16 18 18 18 16 20 ...
  .. ..$ hwy         : int [1:234] 29 29 31 30 26 26 27 26 25 28 ...
  .. ..$ fl          : chr [1:234] "p" "p" "p" "p" ...
  .. ..$ class       : chr [1:234] "compact" "compact" "compact" "compact" ...
  ..$ layers     :List of 1
  .. ..$ :Classes ‘LayerInstance‘, ‘Layer‘, ‘ggproto‘ <ggproto object: Class LayerInstance, Layer>
    aes_params: list
    compute_aesthetics: function
    compute_geom_1: function
    compute_geom_2: function
    compute_position: function
    compute_statistic: function
    data: waiver
    draw_geom: function
    finish_statistics: function
    geom: <ggproto object: Class GeomBoxplot, Geom>
        aesthetics: function
        default_aes: uneval
        draw_group: function
        draw_key: function
        draw_layer: function
        draw_panel: function
        extra_params: na.rm
        handle_na: function
        non_missing_aes: 
        optional_aes: 
        parameters: function
        required_aes: x lower upper middle ymin ymax
        setup_data: function
        use_defaults: function
        super:  <ggproto object: Class Geom>
    geom_params: list
    inherit.aes: TRUE
    layer_data: function
    mapping: NULL
    map_statistic: function
    position: <ggproto object: Class PositionDodge, Position>
        compute_layer: function
        compute_panel: function
        required_aes: x
        setup_data: function
        setup_params: function
        width: NULL
        super:  <ggproto object: Class Position>
    print: function
    show.legend: NA
    stat: <ggproto object: Class StatBoxplot, Stat>
        aesthetics: function
        compute_group: function
        compute_layer: function
        compute_panel: function
        default_aes: uneval
        extra_params: na.rm
        finish_layer: function
        non_missing_aes: weight
        parameters: function
        required_aes: x y
        retransform: TRUE
        setup_data: function
        setup_params: function
        super:  <ggproto object: Class Stat>
    stat_params: list
    subset: NULL
    super:  <ggproto object: Class Layer> 
  ..$ scales     :Classes ‘ScalesList‘, ‘ggproto‘ <ggproto object: Class ScalesList>
    add: function
    clone: function
    find: function
    get_scales: function
    has_scale: function
    input: function
    n: function
    non_position_scales: function
    scales: list
    super:  <ggproto object: Class ScalesList> 
  ..$ mapping    :List of 2
  .. ..$ x: symbol class
  .. ..$ y: symbol hwy
  ..$ theme      : list()
  ..$ coordinates:Classes ‘CoordCartesian‘, ‘Coord‘, ‘ggproto‘ <ggproto object: Class CoordCartesian, Coord>
    aspect: function
    distance: function
    expand: TRUE
    is_linear: function
    labels: function
    limits: list
    range: function
    render_axis_h: function
    render_axis_v: function
    render_bg: function
    render_fg: function
    train: function
    transform: function
    super:  <ggproto object: Class CoordCartesian, Coord> 
  ..$ facet      :Classes ‘FacetNull‘, ‘Facet‘, ‘ggproto‘ <ggproto object: Class FacetNull, Facet>
    compute_layout: function
    draw_back: function
    draw_front: function
    draw_labels: function
    draw_panels: function
    finish_data: function
    init_scales: function
    map: function
    map_data: function
    params: list
    render_back: function
    render_front: function
    render_panels: function
    setup_data: function
    setup_params: function
    shrink: TRUE
    train: function
    train_positions: function
    train_scales: function
    vars: function
    super:  <ggproto object: Class FacetNull, Facet> 
  ..$ plot_env   :<environment: R_GlobalEnv> 
  ..$ labels     :List of 2
  .. ..$ x: chr "class"
  .. ..$ y: chr "hwy"
  ..- attr(*, "class")= chr [1:2] "gg" "ggplot"

從運行結果可以看到,temp這個對象由3個元素構成的列表,第一個元素data 代表繪圖用的數據,第二個元素layout 代表了頁面的布局,第三個元素plot 代表了圖片中的各種屬性

其中data 就是我們想要的那部分數據

> temp$data[[1]]
  ymin lower middle upper ymax                       outliers notchupper
1   23  24.0   25.0  26.0   26                                  26.41319
2   23  26.0   27.0  29.0   33                 35, 37, 35, 44   27.69140
3   23  26.0   27.0  29.0   32                                  27.74026
4   21  22.0   23.0  24.0   24                             17   23.95278
5   15  16.0   17.0  18.0   20                 12, 12, 12, 22   17.55009
6   20  24.5   26.0  30.5   36                         44, 41   27.60241
7   14  17.0   17.5  19.0   22 12, 12, 25, 24, 27, 25, 26, 23   17.90132
  notchlower x PANEL group ymin_final ymax_final  xmin  xmax weight colour
1   23.58681 1     1     1         23         26 0.625 1.375      1 grey20
2   26.30860 2     1     2         23         44 1.625 2.375      1 grey20
3   26.25974 3     1     3         23         32 2.625 3.375      1 grey20
4   22.04722 4     1     4         17         24 3.625 4.375      1 grey20
5   16.44991 5     1     5         12         22 4.625 5.375      1 grey20
6   24.39759 6     1     6         20         44 5.625 6.375      1 grey20
7   17.09868 7     1     7         12         27 6.625 7.375      1 grey20
   fill size alpha shape linetype
1 white  0.5    NA    19    solid
2 white  0.5    NA    19    solid
3 white  0.5    NA    19    solid
4 white  0.5    NA    19    solid
5 white  0.5    NA    19    solid
6 white  0.5    NA    19    solid
7 white  0.5    NA    19    solid

temp$data 是一個只有一個元素的列表,這個元素是一個數據框,記錄了每個箱體的具體數據,共有7行,對應圖片中的7個箱子,數據框的每列給出了對應的ymin, lower, middle, upper, ymax 等值。

對於每一種geom圖層,都可以采用上面的方式來提取中間數據。

ggplot2 提取stat計算出來的數據