1. 程式人生 > >oracle pivot and unpivot

oracle pivot and unpivot

pivot

We will begin with the new PIVOT operation. Most developers will be familiar with pivoting data: it is where multiple rows are aggregated and transposed into columns, with each column representing a different range of aggregate data. An overview of the new syntax is as follows:

SELECT ...
FROM
... PIVOT [XML] ( pivot_clause pivot_for_clause pivot_in_clause ) WHERE ...

In addition to the new PIVOT keyword, we can see three new pivot clauses, described below.

  • pivot_clause: defines the columns to be aggregated (pivot is an aggregate operation); --- 定義要聚集的列
  • pivot_for_clause:
     defines the columns to be grouped and pivoted;---定義要進行分組和中樞列
  • pivot_in_clause: defines the filter for the column(s) in the pivot_for_clause (i.e. the range of values to limit the results to). The aggregations for each value in the pivot_in_clause will be transposed into a separate column (where appropriate).---定義要過濾的列

The syntax and mechanics of pivot queries will become clearer with some examples.

a simple example

Our first example will be a simple demonstration of the PIVOT syntax. Using the EMP table, we will sum the salaries by department and job, but transpose the sum for each department onto its own column. Before we pivot the salaries, we will examine the base data, as follows.

SQL> SELECT job
  2  ,      deptno
  3  ,      SUM(sal) AS sum_sal
  4  FROM   emp
  5  GROUP  BY
  6         job
  7  ,      deptno
  8  ORDER  BY
  9         job
 10  ,      deptno;

JOB           DEPTNO    SUM_SAL
--------- ---------- ----------
ANALYST           20       6600
CLERK             10       1430
CLERK             20       2090
CLERK             30       1045
MANAGER           10       2695
MANAGER           20     3272.5
MANAGER           30       3135
PRESIDENT         10       5500
SALESMAN          30       6160

9 rows selected.

We will now pivot this data using the new 11g syntax. For each job, we will display the salary totals in a separate column for each department, as follows.

SQL> WITH pivot_data AS (
  2          SELECT deptno, job, sal
  3          FROM   emp
  4          )
  5  SELECT *
  6  FROM   pivot_data
  7  PIVOT (
  8             SUM(sal)        --<-- pivot_clause
  9         FOR deptno          --<-- pivot_for_clause
 10         IN  (10,20,30,40)   --<-- pivot_in_clause
 11        );

JOB               10         20         30         40
--------- ---------- ---------- ---------- ----------
CLERK           1430       2090       1045
SALESMAN                              6160
PRESIDENT       5500
MANAGER         2695     3272.5       3135
ANALYST                    6600

5 rows selected.

We can see that the department salary totals for each job have been transposed into columns. There are a few points to note about this example, the syntax and the results:---行轉列

  • Line 8: our pivot_clause sums the SAL column. We can specify multiple columns if required and optionally alias them (we will see examples of aliasing later in this article);
  • Lines 1-4: pivot operations perform an implicit GROUP BY using any columns not in the pivot_clause (in our example, JOB and DEPTNO). For this reason, most pivot queries will be performed on a subset of columns, using stored views, inline views or subqueries, as in our example;---pivot 將不再pivot_clouse語句中的任何列,執行了隱式的分組操作。
  • Line 9: our pivot_for_clause states that we wish to pivot the DEPTNO aggregations only;
  • Line 10: our pivot_in_clause specifies the range of values for DEPTNO. In this example we have hard-coded a list of four values which is why we generated four pivoted columns (one for each value of DEPTNO). In the absence of aliases, Oracle uses the values in the pivot_in_clause to generate the pivot column names (in our output we can see columns named "10", "20", "30" and "40").

An interesting point about the pivot syntax is its placement in the query; namely, between the FROM and WHERE clauses. In the following example, we restrict our original pivot query to a selection of job titles by adding a predicate.---可以利用謂詞顯示返回集

SQL> WITH pivot_data AS (
  2          SELECT deptno, job, sal
  3          FROM   emp
  4          )
  5  SELECT *
  6  FROM   pivot_data
  7  PIVOT (
  8             SUM(sal)        --<-- pivot_clause
  9         FOR deptno          --<-- pivot_for_clause
 10         IN  (10,20,30,40)   --<-- pivot_in_clause
 11        )
 12  WHERE  job IN ('ANALYST','CLERK','SALESMAN');

JOB                10         20         30         40
---------- ---------- ---------- ---------- ----------
CLERK            1430       2090       1045
SALESMAN                               6160
ANALYST                     6600

aliasing pivot columns

In our preceding examples, Oracle used the values of DEPTNO to generate pivot column names. Alternatively, we can alias one or more of the columns in the pivot_clause and one or more of the values in the pivot_in_clause. In general, Oracle will name the pivot columns according to the following conventions:

Pivot Column Aliased?Pivot In-Value Aliased?Pivot Column Name
NNpivot_in_clause value
YYpivot_in_clause alias || '_' || pivot_clause alias
NYpivot_in_clause alias
YNpivot_in_clause value || '_' || pivot_clause alias

We will see examples of each of these aliasing options below (we have already seen examples without any aliases). However, to simplify our examples, we will begin by defining the input dataset as a view, as follows.

SQL> CREATE VIEW pivot_data
  2  AS
  3     SELECT deptno, job, sal
  4     FROM   emp;

View created.

For our first example, we will alias all elements of our pivot query.

SQL> SELECT *
  2  FROM   pivot_data
  3  PIVOT (SUM(sal) AS salaries
  4  FOR    deptno IN (10 AS d10_sal,
  5                    20 AS d20_sal,
  6                    30 AS d30_sal,
  7                    40 AS d40_sal));

JOB        D10_SAL_SALARIES D20_SAL_SALARIES D30_SAL_SALARIES D40_SAL_SALARIES
---------- ---------------- ---------------- ---------------- ----------------
CLERK                  1430             2090             1045
SALESMAN                                                 6160
PRESIDENT              5500
MANAGER                2695           3272.5             3135
ANALYST                                 6600

5 rows selected.

Oracle concatenates our aliases together to generate the column names. In the following example, we will alias the pivot_clause (aggregated column) but not the values in the pivot_in_clause.

SQL> SELECT *
  2  FROM   pivot_data
  3  PIVOT (SUM(sal) AS salaries
  4  FOR    deptno IN (10, 20, 30, 40));

JOB       10_SALARIES 20_SALARIES 30_SALARIES 40_SALARIES
--------- ----------- ----------- ----------- -----------
CLERK            1430        2090        1045
SALESMAN                                 6160
PRESIDENT        5500
MANAGER          2695      3272.5        3135
ANALYST                      6600

5 rows selected.

Oracle generates the pivot column names by concatenating the pivot_in_clause values and the aggregate column alias. Finally, we will only alias the pivot_in_clause values, as follows.

SQL> SELECT *
  2  FROM   pivot_data
  3  PIVOT (SUM(sal)
  4  FOR    deptno IN (10 AS d10_sal,
  5                    20 AS d20_sal,
  6                    30 AS d30_sal,
  7                    40 AS d40_sal));

JOB           D10_SAL    D20_SAL    D30_SAL    D40_SAL
---------- ---------- ---------- ---------- ----------
CLERK            1430       2090       1045
SALESMAN                               6160
PRESIDENT        5500
MANAGER          2695     3272.5       3135
ANALYST                     6600

5 rows selected.

This time, Oracle generated column names from the aliases only. In fact, we can see from all of our examples that the pivot_in_clause is used in all pivot-column naming, regardless of whether we supply an alias or value. We can therefore be selective about which values we alias, as the following example demonstrates.

SQL> SELECT *
  2  FROM   pivot_data
  3  PIVOT (SUM(sal)
  4  FOR    deptno IN (10 AS d10_sal,
  5                    20,
  6                    30 AS d30_sal,
  7                    40));

JOB          D10_SAL         20    D30_SAL         40
--------- ---------- ---------- ---------- ----------
CLERK           1430       2090       1045
SALESMAN                              6160
PRESIDENT       5500
MANAGER         2695     3272.5       3135
ANALYST                    6600

5 rows selected.

pivoting multiple columns

Our examples so far have contained a single aggregate and a single pivot column, although we can define more if we wish. In the following example we will define two aggregations in our pivot_clause for the same range of DEPTNO values that we have used so far. The new aggregate is a count of the salaries that comprise the sum.

SQL> SELECT *
  2  FROM   pivot_data
  3  PIVOT (SUM(sal)   AS sum
  4  ,      COUNT(sal) AS cnt
  5  FOR    deptno IN (10 AS d10_sal,
  6                    20 AS d20_sal,
  7                    30 AS d30_sal,
  8                    40 AS d40_sal));
JOB        D10_SAL_SUM D10_SAL_CNT D20_SAL_SUM D20_SAL_CNT D30_SAL_SUM D30_SAL_CNT D40_SAL_SUM D40_SAL_CNT
---------- ----------- ----------- ----------- ----------- ----------- ----------- ----------- -----------
CLERK             1430           1        2090           2        1045           1                       0
SALESMAN                         0                       0        6160           4                       0
PRESIDENT         5500           1                       0                       0                       0
MANAGER           2695           1      3272.5           1        3135           1                       0
ANALYST                          0        6600           2                       0                       0

5 rows selected.

We have doubled the number of pivot columns (because we doubled the number of aggregates). The number of pivot columns is a product of the number of aggregates and the distinct number of values in the pivot_in_clause. In the following example, we will extend the pivot_for_clause and pivot_in_clause to include values for JOB in the filter.

SQL> SELECT *
  2  FROM   pivot_data
  3  PIVOT (SUM(sal)   AS sum
  4  ,      COUNT(sal) AS cnt
  5  FOR   (deptno,job) IN ((30, 'SALESMAN') AS d30_sls,
  6                         (30, 'MANAGER')  AS d30_mgr,
  7                         (30, 'CLERK')    AS d30_clk));

D30_SLS_SUM D30_SLS_CNT D30_MGR_SUM D30_MGR_CNT D30_CLK_SUM D30_CLK_CNT
----------- ----------- ----------- ----------- ----------- -----------
       6160           4        3135           1        1045           1

1 row selected.

We have limited the query to just 3 jobs within department 30. Note how the pivot_for_clause columns (DEPTNO and JOB) combine to make a single pivot dimension. The aliases we use apply to the combined value domain (for example, "D30_SLS" to represent SALES in department 30).

Finally, because we know the pivot column-naming rules, we can reference them directly, as follows.

SQL> SELECT d30_mgr_sum
  2  ,      d30_clk_cnt
  3  FROM   pivot_data
  4  PIVOT (SUM(sal)   AS sum
  5  ,      COUNT(sal) AS cnt
  6  FOR   (deptno,job) IN ((30, 'SALESMAN') AS d30_sls,
  7                         (30, 'MANAGER')  AS d30_mgr,
  8                         (30, 'CLERK')    AS d30_clk));

D30_MGR_SUM D30_CLK_CNT
----------- -----------
       3135           1

1 row selected.

general restrictions

There are a few simple "gotchas" to be aware of with pivot queries. For example, we cannot project the column(s) used in the pivot_for_clause (DEPTNO in most of our examples). This is to be expected. The column(s) in the pivot_for_clause are grouped according to the range of values we supply with the pivot_in_clause. In the following example, we will attempt to project the DEPTNO column.

SQL> SELECT deptno
  2  FROM   emp
  3  PIVOT (SUM(sal)
  4  FOR    deptno IN (10,20,30,40));
SELECT deptno
       *
ERROR at line 1:
ORA-00904: "DEPTNO": invalid identifier

Oracle raises an ORA-00904 exception. In this case the DEPTNO column is completely removed from the projection and Oracle tells us that it doesn't exist in this scope. Similarly, we cannot include any column(s) used in the pivot_clause, as the following example demonstrates.

SQL> SELECT sal
  2  FROM   emp
  3  PIVOT (SUM(sal)
  4  FOR    deptno IN (10,20,30,40));
SELECT sal
       *
ERROR at line 1:
ORA-00904: "SAL": invalid identifier

We attempted to project the SAL column but Oracle raised the same exception. This is also to be expected: the pivot_clause defines our aggregations. This also means, of course, that we must use aggregate functions in the pivot_clause. In the following example, we will attempt to define a pivot_clause with a single-group column.

SQL> SELECT *
  2  FROM   emp
  3  PIVOT (sal
  4  FOR    deptno IN (10,20,30,40));
PIVOT (sal AS salaries
       *
ERROR at line 3:
ORA-56902: expect aggregate function inside pivot operation

Oracle raises a new ORA-56902 exception: the error message numbers are getting much higher with every release!

execution plans for pivot operations

As we have stated, pivot operations imply a GROUP BY, but we don't need to specify it. We can investigate this by explaining one of our pivot query examples, as follows. We will use Autotrace for convenience (Autotrace uses EXPLAIN PLAN and DBMS_XPLAN to display theoretical execution plans).

SQL> set autotrace traceonly explain

SQL> SELECT *
  2  FROM   pivot_data
  3  PIVOT (SUM(sal)
  4  FOR    deptno IN (10 AS d10_sal,
  5                    20 AS d20_sal,
  6                    30 AS d30_sal,
  7                    40 AS d40_sal));

Execution Plan
----------------------------------------------------------
Plan hash value: 1475541029

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     5 |    75 |     4  (25)| 00:00:01 |
|   1 |  HASH GROUP BY PIVOT|      |     5 |    75 |     4  (25)| 00:00:01 |
|   2 |   TABLE ACCESS FULL | EMP  |    14 |   210 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------

The plan output tells us that this query uses a HASH GROUP BY PIVOT operation. The HASH GROUP BY is a feature of 10g Release 2, but the PIVOT extension is new to 11g. Pivot queries do not automatically generate a PIVOT plan, however. In the following example, we will limit the domain of values in our pivot_in_clause and use Autotrace to explain the query again.

SQL> SELECT *
  2  FROM   pivot_data
  3  PIVOT (SUM(sal)   AS sum
  4  ,      COUNT(sal) AS cnt
  5  FOR   (deptno,job) IN ((30, 'SALESMAN') AS d30_sls,
  6                         (30, 'MANAGER')  AS d30_mgr,
  7                         (30, 'CLERK')    AS d30_clk));

Execution Plan
----------------------------------------------------------
Plan hash value: 1190005124

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     1 |    78 |     3   (0)| 00:00:01 |
|   1 |  VIEW               |      |     1 |    78 |     3   (0)| 00:00:01 |
|   2 |   SORT AGGREGATE    |      |     1 |    15 |            |          |
|   3 |    TABLE ACCESS FULL| EMP  |    14 |   210 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------

This time the CBO has costed a simple aggregation over a group by with pivot. It has correctly identified that only one record will be returned from this query, so the GROUP BY operation is unnecessary. Finally, we will explain our first pivot example but use the extended formatting options of DBMS_XPLAN to reveal more information about the work that Oracle is doing.

SQL> EXPLAIN PLAN SET STATEMENT_ID = 'PIVOT'
  2  FOR
  3     SELECT *
  4     FROM   pivot_data
  5     PIVOT (SUM(sal)
  6     FOR    deptno IN (10 AS d10_sal,
  7                       20 AS d20_sal,
  8                       30 AS d30_sal,
  9                       40 AS d40_sal));

Explained.

SQL> SELECT *
  2  FROM   TABLE(
  3            DBMS_XPLAN.DISPLAY(
  4               NULL, 'PIVOT', 'TYPICAL +PROJECTION'));

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------
Plan hash value: 1475541029

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     5 |    75 |     4  (25)| 00:00:01 |
|   1 |  HASH GROUP BY PIVOT|      |     5 |    75 |     4  (25)| 00:00:01 |
|   2 |   TABLE ACCESS FULL | EMP  |    14 |   210 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------

Column Projection Information (identified by operation id):
-----------------------------------------------------------

   1 - (#keys=1) "JOB"[VARCHAR2,9], SUM(CASE  WHEN ("DEPTNO"=10) THEN
       "SAL" END )[22], SUM(CASE  WHEN ("DEPTNO"=20) THEN "SAL" END )[22],
       SUM(CASE  WHEN ("DEPTNO"=30) THEN "SAL" END )[22], SUM(CASE  WHEN
       ("DEPTNO"=40) THEN "SAL" END )[22]
   2 - "JOB"[VARCHAR2,9], "SAL"[NUMBER,22], "DEPTNO"[NUMBER,22]

18 rows selected.

DBMS_XPLAN optionally exposes the column projection information contained in PLAN_TABLE for each step of a query. The projection for ID=2 shows the base columns that we select in the PIVOT_DATA view over EMP. The interesting information, however, is for ID=1 (this step is our pivot operation). This clearly shows how Oracle is generating the pivot columns. Many developers will be familiar with this form of SQL: it is how we write pivot queries in versions prior to 11g. Oracle has chosen a CASE expression, but we commonly use DECODE for brevity, as follows.

SQL> SELECT job
  2  ,      SUM(DECODE(deptno,10,sal)) AS "D10_SAL"
  3  ,      SUM(DECODE(deptno,20,sal)) AS "D20_SAL"
  4  ,      SUM(DECODE(deptno,30,sal)) AS "D30_SAL"
  5  ,      SUM(DECODE(deptno,40,sal)) AS "D40_SAL"
  6  FROM   emp
  7  GROUP  BY
  8         job;

JOB          D10_SAL    D20_SAL    D30_SAL    D40_SAL
--------- ---------- ---------- ---------- ----------
CLERK           1430       2090       1045
SALESMAN                              6160
PRESIDENT       5500
MANAGER         2695     3272.5       3135
ANALYST                    6600

5 rows selected.

pivot performance

From the evidence we have seen, it appears as though Oracle implements the new PIVOT syntax using a recognised SQL format. It follows that we should expect the same performance for our pivot queries regardless of the technique we use (in other words the 11g PIVOT syntax will perform the sam