1. 程式人生 > >Oracle group by 基本及的拓展 ROLLUP, CUBE, GROUPING 功能and GROUPING 集合

Oracle group by 基本及的拓展 ROLLUP, CUBE, GROUPING 功能and GROUPING 集合

/*

 宣告:本人不是專門學資料庫的,也不是專門的翻譯,只是因為碰到一個問題(SQL CookBook中)找了一下,發現一個英文網站的解釋很清晰,特此翻譯過來,mark.不喜勿磚,謝謝!
原文連結:

*/

環境:
DROP TABLE dimension_tab;
CREATE TABLE dimension_tab (
  fact_1_id   NUMBER NOT NULL,
  fact_2_id   NUMBER NOT NULL,
  fact_3_id   NUMBER NOT NULL, 
  fact_4_id   NUMBER NOT NULL,
  sales_value NUMBER(10,2) NOT NULL
);

INSERT INTO dimension_tab
SELECT TRUNC(DBMS_RANDOM.value(low => 1, high => 3)) AS fact_1_id,
       TRUNC(DBMS_RANDOM.value(low => 1, high => 6)) AS fact_2_id,
       TRUNC(DBMS_RANDOM.value(low => 1, high => 11)) AS fact_3_id,
       TRUNC(DBMS_RANDOM.value(low => 1, high => 11)) AS fact_4_id,
       ROUND(DBMS_RANDOM.value(low => 1, high => 100), 2) AS sales_value
FROM   dual
CONNECT BY level <= 1000;
COMMIT;

1.Group by 基本用法

為了方便理解,先用一個聚合函式,求和.(未使用group by)
SELECT SUM(sales_value) AS sales_value
FROM   dimension_tab;

SALES_VALUE
-----------
   50528.39

1 row selected.

SQL>
SELECT SUM(sales_value) AS sales_value
FROM   dimension_tab;

SALES_VALUE
-----------
   50528.39

1 row selected.

SQL>

將想要分組的列放在group by 之後.結果的行數是我們目標列中包含不同值的個數.
SELECT fact_1_id,
       COUNT(*) AS num_rows,
       SUM(sales_value) AS sales_value
FROM   dimension_tab
GROUP BY fact_1_id
ORDER BY fact_1_id;

 FACT_1_ID   NUM_ROWS SALES_VALUE
---------- ---------- -----------
         1        478    24291.35
         2        522    26237.04

2 rows selected.

SQL>

如果包含兩列,將會產生聚合結果(解釋為笛卡爾積?),返回10列(2*5)
SELECT fact_1_id,
       fact_2_id,
       COUNT(*) AS num_rows,
       SUM(sales_value) AS sales_value
FROM   dimension_tab
GROUP BY fact_1_id, fact_2_id
ORDER BY fact_1_id, fact_2_id;

 FACT_1_ID  FACT_2_ID   NUM_ROWS SALES_VALUE
---------- ---------- ---------- -----------
         1          1         83     4363.55
         1          2         96     4794.76
         1          3         93     4718.25
         1          4        105     5387.45
         1          5        101     5027.34
         2          1        109     5652.84
         2          2         96     4583.02
         2          3        110     5555.77
         2          4        113     5936.67
         2          5         94     4508.74

10 rows selected.

SQL>

2.ROLLUP

出了正常的分組結果外,使用rollup還會,返回部分列分組,規則:從右向左,知道一個完整的分組.如果,rollup中的列數為n,那麼將會有n+1個分組層次.
SELECT fact_1_id,
       fact_2_id,
       SUM(sales_value) AS sales_value
FROM   dimension_tab
GROUP BY ROLLUP (fact_1_id, fact_2_id)
ORDER BY fact_1_id, fact_2_id;

 FACT_1_ID  FACT_2_ID SALES_VALUE
---------- ---------- -----------
         1          1     4363.55
         1          2     4794.76
         1          3     4718.25
         1          4     5387.45
         1          5     5027.34
         1               24291.35
         2          1     5652.84
         2          2     4583.02
         2          3     5555.77
         2          4     5936.67
         2          5     4508.74
         2               26237.04
                         50528.39

13 rows selected.

SQL>
以下語句結果:Click Here.當行中包含null的時候,這種並不是一種很好的方法,稍後會討論.
SELECT fact_1_id,
       fact_2_id,
       fact_3_id,
       SUM(sales_value) AS sales_value
FROM   dimension_tab
GROUP BY ROLLUP (fact_1_id, fact_2_id, fact_3_id)
ORDER BY fact_1_id, fact_2_id, fact_3_id;
也可以在group by的時候使用rolluo做部分分組.結果:Click Here
SELECT fact_1_id,
       fact_2_id,
       fact_3_id,
       SUM(sales_value) AS sales_value
FROM   dimension_tab
GROUP BY fact_1_id, ROLLUP (fact_2_id, fact_3_id)
ORDER BY fact_1_id, fact_2_id, fact_3_id;

3.CUBE

Cube是rollup的拓展,Cube將會返回所有分組的一個統計,如果,Cube中有n組,那麼將返回2的n次方組合結果.
SELECT fact_1_id,
       fact_2_id,
       SUM(sales_value) AS sales_value
FROM   dimension_tab
GROUP BY CUBE (fact_1_id, fact_2_id)
ORDER BY fact_1_id, fact_2_id;

 FACT_1_ID  FACT_2_ID SALES_VALUE
---------- ---------- -----------
         1          1     4363.55
         1          2     4794.76
         1          3     4718.25
         1          4     5387.45
         1          5     5027.34
         1               24291.35
         2          1     5652.84
         2          2     4583.02
         2          3     5555.77
         2          4     5936.67
         2          5     4508.74
         2               26237.04
                    1    10016.39
                    2     9377.78
                    3    10274.02
                    4    11324.12
                    5     9536.08
                         50528.39

18 rows selected.

SQL>
如果cube中有三組(或n組)時,結果(或需計算):Click Here
SELECT fact_1_id,
       fact_2_id,
       fact_3_id,
       SUM(sales_value) AS sales_value
FROM   dimension_tab
GROUP BY CUBE (fact_1_id, fact_2_id, fact_3_id)
ORDER BY fact_1_id, fact_2_id, fact_3_id;

也有可能只是一部分進行cube分組,結果:Click Here
SELECT fact_1_id,
       fact_2_id,
       fact_3_id,
       SUM(sales_value) AS sales_value
FROM   dimension_tab
GROUP BY fact_1_id, CUBE (fact_2_id, fact_3_id)
ORDER BY fact_1_id, fact_2_id, fact_3_id;

4.Grouping 函式

上面曾經說過,如果某一列存在null值,而我們的rollup或者cube時,也將會出現列為null的時候,該怎樣區分null到底是由資料null還是由於rollup活cube自己生成的null呢.這裡使用grouping來解決這個問題.如果是資料本身的值(null或其他值),grouping(列名)將會在這一行返回0,如果這一行是由於rollup或者cube產生的話,將會返回1.
SELECT fact_1_id,
       fact_2_id,
       SUM(sales_value) AS sales_value,
       GROUPING(fact_1_id) AS f1g, 
       GROUPING(fact_2_id) AS f2g
FROM   dimension_tab
GROUP BY CUBE (fact_1_id, fact_2_id)
ORDER BY fact_1_id, fact_2_id;

 FACT_1_ID  FACT_2_ID SALES_VALUE        F1G        F2G
---------- ---------- ----------- ---------- ----------
         1          1     4363.55          0          0
         1          2     4794.76          0          0
         1          3     4718.25          0          0
         1          4     5387.45          0          0
         1          5     5027.34          0          0
         1               24291.35          0          1
         2          1     5652.84          0          0
         2          2     4583.02          0          0
         2          3     5555.77          0          0
         2          4     5936.67          0          0
         2          5     4508.74          0          0
         2               26237.04          0          1
                    1    10016.39          1          0
                    2     9377.78          1          0
                    3    10274.02          1          0
                    4    11324.12          1          0
                    5     9536.08          1          0
                         50528.39          1          1

18 rows selected.

SQL>

可以看出:(都是由rollup或cube所產生的)

  • F1G=0,F2G=0 : 正常的group by結果
  • F1G=0,F2G=1 : 一行 FACT_1_ID 列的統計
  • F1G=1,F2G=0 :  一行FACT_2_ID 列的統計
  • F1G=1,F2G=1 : 由FACT_1_ID 和FACT_2_ID 所產生的一個統計

grouping函式可以用來篩選排序結果.

SELECT fact_1_id,
       fact_2_id,
       SUM(sales_value) AS sales_value,
       GROUPING(fact_1_id) AS f1g, 
       GROUPING(fact_2_id) AS f2g
FROM   dimension_tab
GROUP BY CUBE (fact_1_id, fact_2_id)
HAVING GROUPING(fact_1_id) = 1 OR GROUPING(fact_2_id) = 1
ORDER BY GROUPING(fact_1_id), GROUPING(fact_2_id);

 FACT_1_ID  FACT_2_ID SALES_VALUE        F1G        F2G
---------- ---------- ----------- ---------- ----------
         1               24291.35          0          1
         2               26237.04          0          1
                    4    11324.12          1          0
                    3    10274.02          1          0
                    2     9377.78          1          0
                    1    10016.39          1          0
                    5     9536.08          1          0
                         50528.39          1          1

8 rows selected.

SQL>

5.GROUPING_ID函式

grouping_id函式也是提供了一個可以確定是否為統計的行的辨別方式.grouping_id返回的是group by分組的級別(或者說層次)

SELECT fact_1_id,
       fact_2_id,
       SUM(sales_value) AS sales_value,
       GROUPING_ID(fact_1_id, fact_2_id) AS grouping_id
FROM   dimension_tab
GROUP BY CUBE (fact_1_id, fact_2_id)
ORDER BY fact_1_id, fact_2_id;

 FACT_1_ID  FACT_2_ID SALES_VALUE GROUPING_ID
---------- ---------- ----------- -----------
         1          1     4363.55           0
         1          2     4794.76           0
         1          3     4718.25           0
         1          4     5387.45           0
         1          5     5027.34           0
         1               24291.35           1
         2          1     5652.84           0
         2          2     4583.02           0
         2          3     5555.77           0
         2          4     5936.67           0
         2          5     4508.74           0
         2               26237.04           1
                    1    10016.39           2
                    2     9377.78           2
                    3    10274.02           2
                    4    11324.12           2
                    5     9536.08           2
                         50528.39           3

18 rows selected.

SQL>

6.GROUP_ID

重複的統計的結果進行劃分.第一次為0,依次出現相同結果,id 1開始遞增.

SELECT fact_1_id,
       fact_2_id,
       SUM(sales_value) AS sales_value,
       GROUPING_ID(fact_1_id, fact_2_id) AS grouping_id,
       GROUP_ID() AS group_id
FROM   dimension_tab
GROUP BY GROUPING SETS(fact_1_id, CUBE (fact_1_id, fact_2_id))
ORDER BY fact_1_id, fact_2_id;

 FACT_1_ID  FACT_2_ID SALES_VALUE GROUPING_ID   GROUP_ID
---------- ---------- ----------- ----------- ----------
         1          1     4363.55           0          0
         1          2     4794.76           0          0
         1          3     4718.25           0          0
         1          4     5387.45           0          0
         1          5     5027.34           0          0
         1               24291.35           1          1
         1               24291.35           1          0
         2          1     5652.84           0          0
         2          2     4583.02           0          0
         2          3     5555.77           0          0
         2          4     5936.67           0          0
         2          5     4508.74           0          0
         2               26237.04           1          1
         2               26237.04           1          0
                    1    10016.39           2          0
                    2     9377.78           2          0
                    3    10274.02           2          0
                    4    11324.12           2          0
                    5     9536.08           2          0
                         50528.39           3          0

20 rows selected.

SQL>

也可對結果進行篩選.

SELECT fact_1_id,
       fact_2_id,
       SUM(sales_value) AS sales_value,
       GROUPING_ID(fact_1_id, fact_2_id) AS grouping_id,
       GROUP_ID() AS group_id
FROM   dimension_tab
GROUP BY GROUPING SETS(fact_1_id, CUBE (fact_1_id, fact_2_id))
HAVING GROUP_ID() = 0
ORDER BY fact_1_id, fact_2_id;

 FACT_1_ID  FACT_2_ID SALES_VALUE GROUPING_ID   GROUP_ID
---------- ---------- ----------- ----------- ----------
         1          1     4363.55           0          0
         1          2     4794.76           0          0
         1          3     4718.25           0          0
         1          4     5387.45           0          0
         1          5     5027.34           0          0
         1               24291.35           1          0
         2          1     5652.84           0          0
         2          2     4583.02           0          0
         2          3     5555.77           0          0
         2          4     5936.67           0          0
         2          5     4508.74           0          0
         2               26237.04           1          0
                    1    10016.39           2          0
                    2     9377.78           2          0
                    3    10274.02           2          0
                    4    11324.12           2          0
                    5     9536.08           2          0
                         50528.39           3          0

18 rows selected.

SQL>

7.GROUPING 集合

使用cube,特別是在多列時,分組會很多.以下為例,將會有8個組層次.結果:Click Here

SELECT fact_1_id,
       fact_2_id,
       fact_3_id,
       SUM(sales_value) AS sales_value,
       GROUPING_ID(fact_1_id, fact_2_id, fact_3_id) AS grouping_id
FROM   dimension_tab
GROUP BY CUBE(fact_1_id, fact_2_id, fact_3_id)
ORDER BY fact_1_id, fact_2_id, fact_3_id;
如果我們只需要其中一些分組結果,用grouping sets來篩選,選出"FACT_1_ID, FACT_2_ID" and "FACT_1_ID, FACT_3_ID分組的統計
SELECT fact_1_id,
       fact_2_id,
       fact_3_id,
       SUM(sales_value) AS sales_value,
       GROUPING_ID(fact_1_id, fact_2_id, fact_3_id) AS grouping_id
FROM   dimension_tab
GROUP BY GROUPING SETS((fact_1_id, fact_2_id), (fact_1_id, fact_3_id))
ORDER BY fact_1_id, fact_2_id, fact_3_id;

 FACT_1_ID  FACT_2_ID  FACT_3_ID SALES_VALUE GROUPING_ID
---------- ---------- ---------- ----------- -----------
         1          1                4363.55           1
         1          2                4794.76           1
         1          3                4718.25           1
         1          4                5387.45           1
         1          5                5027.34           1
         1                     1      2737.4           2
         1                     2     1854.29           2
         1                     3     2090.96           2
         1                     4     2605.17           2
         1                     5     2590.93           2
         1                     6      2506.9           2
         1                     7     1839.85           2
         1                     8     2953.04           2
         1                     9     2778.75           2
         1                    10     2334.06           2
         2          1                5652.84           1
         2          2                4583.02           1
         2          3                5555.77           1
         2          4                5936.67           1
         2          5                4508.74           1
         2                     1     3512.69           2
         2                     2     2847.94           2
         2                     3      2972.5           2
         2                     4     2534.06           2
         2                     5     3115.99           2
         2                     6     2775.85           2
         2                     7     2208.19           2
         2                     8     2358.55           2
         2                     9     1884.11           2
         2                    10     2027.16           2

30 rows selected.

SQL>

8.組合列

使用rollup的列的組合(統計),

ROLLUP (a, b, c)
(a, b, c)
(a, b)
(a)
()
使用cube的列的組合(統計),
CUBE (a, b, c)
(a, b, c)
(a, b)
(a, c)
(a)
(b, c)
(b)
(c)
()
允許使用括號將列括起來.在使用rollup或cube,grouping sets時,會將其作為一個單獨的整體,不會拆分括號裡面的列再組合(統計).
ROLLUP ((a, b), c)
(a, b, c)
(a, b)
()

Not considered:
(a)

CUBE ((a, b), c)
(a, b, c)
(a, b)
(c)
()

Not considered:
(a, c)
(a)
(b, c)
(b)

-- Regular Cube.
SELECT fact_1_id,
       fact_2_id,
       fact_3_id,
       SUM(sales_value) AS sales_value,
       GROUPING_ID(fact_1_id, fact_2_id, fact_3_id) AS grouping_id
FROM   dimension_tab
GROUP BY CUBE(fact_1_id, fact_2_id, fact_3_id)
ORDER BY fact_1_id, fact_2_id, fact_3_id;

-- Cube with composite column.
SELECT fact_1_id,
       fact_2_id,
       fact_3_id,
       SUM(sales_value) AS sales_value,
       GROUPING_ID(fact_1_id, fact_2_id, fact_3_id) AS grouping_id
FROM   dimension_tab
GROUP BY CUBE((fact_1_id, fact_2_id), fact_3_id)
ORDER BY fact_1_id, fact_2_id, fact_3_id;

9.ConcatenatedGroupings(個人理解:組之間的笛卡爾積)

在使用group by時,有GROUPING SETSCUBE 和ROLLUP組合時,不同組會產生笛卡爾積.

SELECT fact_1_id,
       fact_2_id,
       SUM(sales_value) AS sales_value,
       GROUPING_ID(fact_1_id, fact_2_id) AS grouping_id
FROM   dimension_tab
GROUP BY GROUPING SETS(fact_1_id, fact_2_id)
ORDER BY fact_1_id, fact_2_id;

 FACT_1_ID  FACT_2_ID SALES_VALUE GROUPING_ID
---------- ---------- ----------- -----------
         1               24291.35           1
         2               26237.04           1
                    1    10016.39           2
                    2     9377.78           2
                    3    10274.02           2
                    4    11324.12           2
                    5     9536.08           2

7 rows selected.

SQL>
SELECT fact_3_id,
       fact_4_id,
       SUM(sales_value) AS sales_value,
       GROUPING_ID(fact_3_id, fact_4_id) AS grouping_id
FROM   dimension_tab
GROUP BY GROUPING SETS(fact_3_id, fact_4_id)
ORDER BY fact_3_id, fact_4_id;

 FACT_3_ID  FACT_4_ID SALES_VALUE GROUPING_ID
---------- ---------- ----------- -----------
         1                6250.09           1
         2                4702.23           1
         3                5063.46           1
         4                5139.23           1
         5                5706.92           1
         6                5282.75           1
         7                4048.04           1
         8                5311.59           1
         9                4662.86           1
        10                4361.22           1
                    1     4718.55           2
                    2      5439.1           2
                    3      4643.4           2
                    4      4515.3           2
                    5     5110.27           2
                    6     5910.78           2
                    7     4987.22           2
                    8     4846.25           2
                    9     5458.82           2
                   10      4898.7           2

20 rows selected.

SQL>
如果我們將以上兩組,在group by中一起.結果:Click Here
SELECT fact_1_id,
       fact_2_id,
       fact_3_id,
       fact_4_id,
       SUM(sales_value) AS sales_value,
       GROUPING_ID(fact_1_id, fact_2_id, fact_3_id, fact_4_id) AS grouping_id
FROM   dimension_tab
GROUP BY GROUPING SETS(fact_1_id, fact_2_id), GROUPING SETS(fact_3_id, fact_4_id)
ORDER BY fact_1_id, fact_2_id, fact_3_id, fact_4_id;
將會產生一下組集合的分組,
GROUPING SETS(fact_1_id, fact_2_id) 
(fact_1_id)
(fact_2_id)

GROUPING SETS(fact_3_id, fact_4_id) 
(fact_3_id)
(fact_4_id)

GROUPING SETS(fact_1_id, fact_2_id), GROUPING SETS(fact_3_id, fact_4_id) 
(fact_1_id, fact_3_id)
(fact_1_id, fact_4_id)
(fact_2_id, fact_3_id)
(fact_2_id, fact_4_id)
在兩個GROUPING SETS中將產生笛卡爾積.
GROUPING SETS(a, b), GROUPING SETS(c, d) 
(a, c)
(a, d)
(b, c)
(b, d)

[個人理解]    在使用group by的時候,特別是使用統計函式時,cube rollup和grouping sets是很有用的,可以分組進行統計.在分組的同時,我們還有grouping_id(返回分組的級別層次計數)和grouping(判斷行是否由cube或rollup產生)進行一系列的操作.