oracle pivot and unpivot
pivot
We will begin with the new PIVOT operation. Most developers will be familiar with pivoting data: it is where multiple rows are aggregated and transposed into columns, with each column representing a different range of aggregate data. An overview of the new syntax is as follows:
SELECT ... FROM... PIVOT [XML] ( pivot_clause pivot_for_clause pivot_in_clause ) WHERE ...
In addition to the new PIVOT keyword, we can see three new pivot clauses, described below.
- pivot_clause: defines the columns to be aggregated (pivot is an aggregate operation); --- 定義要聚集的列
- pivot_for_clause:
- pivot_in_clause: defines the filter for the column(s) in the pivot_for_clause (i.e. the range of values to limit the results to). The aggregations for each value in the pivot_in_clause will be transposed into a separate column (where appropriate).---定義要過濾的列
The syntax and mechanics of pivot queries will become clearer with some examples.
a simple example
Our first example will be a simple demonstration of the PIVOT syntax. Using the EMP table, we will sum the salaries by department and job, but transpose the sum for each department onto its own column. Before we pivot the salaries, we will examine the base data, as follows.
SQL> SELECT job 2 , deptno 3 , SUM(sal) AS sum_sal 4 FROM emp 5 GROUP BY 6 job 7 , deptno 8 ORDER BY 9 job 10 , deptno;
JOB DEPTNO SUM_SAL --------- ---------- ---------- ANALYST 20 6600 CLERK 10 1430 CLERK 20 2090 CLERK 30 1045 MANAGER 10 2695 MANAGER 20 3272.5 MANAGER 30 3135 PRESIDENT 10 5500 SALESMAN 30 6160 9 rows selected.
We will now pivot this data using the new 11g syntax. For each job, we will display the salary totals in a separate column for each department, as follows.
SQL> WITH pivot_data AS ( 2 SELECT deptno, job, sal 3 FROM emp 4 ) 5 SELECT * 6 FROM pivot_data 7 PIVOT ( 8 SUM(sal) --<-- pivot_clause 9 FOR deptno --<-- pivot_for_clause 10 IN (10,20,30,40) --<-- pivot_in_clause 11 );JOB 10 20 30 40 --------- ---------- ---------- ---------- ---------- CLERK 1430 2090 1045 SALESMAN 6160 PRESIDENT 5500 MANAGER 2695 3272.5 3135 ANALYST 6600 5 rows selected.We can see that the department salary totals for each job have been transposed into columns. There are a few points to note about this example, the syntax and the results:---行轉列
- Line 8: our pivot_clause sums the SAL column. We can specify multiple columns if required and optionally alias them (we will see examples of aliasing later in this article);
- Lines 1-4: pivot operations perform an implicit GROUP BY using any columns not in the pivot_clause (in our example, JOB and DEPTNO). For this reason, most pivot queries will be performed on a subset of columns, using stored views, inline views or subqueries, as in our example;---pivot 將不再pivot_clouse語句中的任何列,執行了隱式的分組操作。
- Line 9: our pivot_for_clause states that we wish to pivot the DEPTNO aggregations only;
- Line 10: our pivot_in_clause specifies the range of values for DEPTNO. In this example we have hard-coded a list of four values which is why we generated four pivoted columns (one for each value of DEPTNO). In the absence of aliases, Oracle uses the values in the pivot_in_clause to generate the pivot column names (in our output we can see columns named "10", "20", "30" and "40").
An interesting point about the pivot syntax is its placement in the query; namely, between the FROM and WHERE clauses. In the following example, we restrict our original pivot query to a selection of job titles by adding a predicate.---可以利用謂詞顯示返回集
SQL> WITH pivot_data AS ( 2 SELECT deptno, job, sal 3 FROM emp 4 ) 5 SELECT * 6 FROM pivot_data 7 PIVOT ( 8 SUM(sal) --<-- pivot_clause 9 FOR deptno --<-- pivot_for_clause 10 IN (10,20,30,40) --<-- pivot_in_clause 11 ) 12 WHERE job IN ('ANALYST','CLERK','SALESMAN');
JOB 10 20 30 40 ---------- ---------- ---------- ---------- ---------- CLERK 1430 2090 1045 SALESMAN 6160 ANALYST 6600
aliasing pivot columns
In our preceding examples, Oracle used the values of DEPTNO to generate pivot column names. Alternatively, we can alias one or more of the columns in the pivot_clause and one or more of the values in the pivot_in_clause. In general, Oracle will name the pivot columns according to the following conventions:
Pivot Column Aliased? | Pivot In-Value Aliased? | Pivot Column Name |
N | N | pivot_in_clause value |
Y | Y | pivot_in_clause alias || '_' || pivot_clause alias |
N | Y | pivot_in_clause alias |
Y | N | pivot_in_clause value || '_' || pivot_clause alias |
We will see examples of each of these aliasing options below (we have already seen examples without any aliases). However, to simplify our examples, we will begin by defining the input dataset as a view, as follows.
SQL> CREATE VIEW pivot_data 2 AS 3 SELECT deptno, job, sal 4 FROM emp;
View created.
For our first example, we will alias all elements of our pivot query.
SQL> SELECT * 2 FROM pivot_data 3 PIVOT (SUM(sal) AS salaries 4 FOR deptno IN (10 AS d10_sal, 5 20 AS d20_sal, 6 30 AS d30_sal, 7 40 AS d40_sal));
JOB D10_SAL_SALARIES D20_SAL_SALARIES D30_SAL_SALARIES D40_SAL_SALARIES ---------- ---------------- ---------------- ---------------- ---------------- CLERK 1430 2090 1045 SALESMAN 6160 PRESIDENT 5500 MANAGER 2695 3272.5 3135 ANALYST 6600 5 rows selected.
Oracle concatenates our aliases together to generate the column names. In the following example, we will alias the pivot_clause (aggregated column) but not the values in the pivot_in_clause.
SQL> SELECT * 2 FROM pivot_data 3 PIVOT (SUM(sal) AS salaries 4 FOR deptno IN (10, 20, 30, 40));
JOB 10_SALARIES 20_SALARIES 30_SALARIES 40_SALARIES --------- ----------- ----------- ----------- ----------- CLERK 1430 2090 1045 SALESMAN 6160 PRESIDENT 5500 MANAGER 2695 3272.5 3135 ANALYST 6600 5 rows selected.
Oracle generates the pivot column names by concatenating the pivot_in_clause values and the aggregate column alias. Finally, we will only alias the pivot_in_clause values, as follows.
SQL> SELECT * 2 FROM pivot_data 3 PIVOT (SUM(sal) 4 FOR deptno IN (10 AS d10_sal, 5 20 AS d20_sal, 6 30 AS d30_sal, 7 40 AS d40_sal));
JOB D10_SAL D20_SAL D30_SAL D40_SAL ---------- ---------- ---------- ---------- ---------- CLERK 1430 2090 1045 SALESMAN 6160 PRESIDENT 5500 MANAGER 2695 3272.5 3135 ANALYST 6600 5 rows selected.
This time, Oracle generated column names from the aliases only. In fact, we can see from all of our examples that the pivot_in_clause is used in all pivot-column naming, regardless of whether we supply an alias or value. We can therefore be selective about which values we alias, as the following example demonstrates.
SQL> SELECT * 2 FROM pivot_data 3 PIVOT (SUM(sal) 4 FOR deptno IN (10 AS d10_sal, 5 20, 6 30 AS d30_sal, 7 40));
JOB D10_SAL 20 D30_SAL 40 --------- ---------- ---------- ---------- ---------- CLERK 1430 2090 1045 SALESMAN 6160 PRESIDENT 5500 MANAGER 2695 3272.5 3135 ANALYST 6600 5 rows selected.
pivoting multiple columns
Our examples so far have contained a single aggregate and a single pivot column, although we can define more if we wish. In the following example we will define two aggregations in our pivot_clause for the same range of DEPTNO values that we have used so far. The new aggregate is a count of the salaries that comprise the sum.
SQL> SELECT * 2 FROM pivot_data 3 PIVOT (SUM(sal) AS sum 4 , COUNT(sal) AS cnt 5 FOR deptno IN (10 AS d10_sal, 6 20 AS d20_sal, 7 30 AS d30_sal, 8 40 AS d40_sal));
JOB D10_SAL_SUM D10_SAL_CNT D20_SAL_SUM D20_SAL_CNT D30_SAL_SUM D30_SAL_CNT D40_SAL_SUM D40_SAL_CNT ---------- ----------- ----------- ----------- ----------- ----------- ----------- ----------- ----------- CLERK 1430 1 2090 2 1045 1 0 SALESMAN 0 0 6160 4 0 PRESIDENT 5500 1 0 0 0 MANAGER 2695 1 3272.5 1 3135 1 0 ANALYST 0 6600 2 0 0 5 rows selected.
We have doubled the number of pivot columns (because we doubled the number of aggregates). The number of pivot columns is a product of the number of aggregates and the distinct number of values in the pivot_in_clause. In the following example, we will extend the pivot_for_clause and pivot_in_clause to include values for JOB in the filter.
SQL> SELECT * 2 FROM pivot_data 3 PIVOT (SUM(sal) AS sum 4 , COUNT(sal) AS cnt 5 FOR (deptno,job) IN ((30, 'SALESMAN') AS d30_sls, 6 (30, 'MANAGER') AS d30_mgr, 7 (30, 'CLERK') AS d30_clk));
D30_SLS_SUM D30_SLS_CNT D30_MGR_SUM D30_MGR_CNT D30_CLK_SUM D30_CLK_CNT ----------- ----------- ----------- ----------- ----------- ----------- 6160 4 3135 1 1045 1 1 row selected.
We have limited the query to just 3 jobs within department 30. Note how the pivot_for_clause columns (DEPTNO and JOB) combine to make a single pivot dimension. The aliases we use apply to the combined value domain (for example, "D30_SLS" to represent SALES in department 30).
Finally, because we know the pivot column-naming rules, we can reference them directly, as follows.
SQL> SELECT d30_mgr_sum 2 , d30_clk_cnt 3 FROM pivot_data 4 PIVOT (SUM(sal) AS sum 5 , COUNT(sal) AS cnt 6 FOR (deptno,job) IN ((30, 'SALESMAN') AS d30_sls, 7 (30, 'MANAGER') AS d30_mgr, 8 (30, 'CLERK') AS d30_clk));
D30_MGR_SUM D30_CLK_CNT ----------- ----------- 3135 1 1 row selected.
general restrictions
There are a few simple "gotchas" to be aware of with pivot queries. For example, we cannot project the column(s) used in the pivot_for_clause (DEPTNO in most of our examples). This is to be expected. The column(s) in the pivot_for_clause are grouped according to the range of values we supply with the pivot_in_clause. In the following example, we will attempt to project the DEPTNO column.
SQL> SELECT deptno 2 FROM emp 3 PIVOT (SUM(sal) 4 FOR deptno IN (10,20,30,40));
SELECT deptno * ERROR at line 1: ORA-00904: "DEPTNO": invalid identifier
Oracle raises an ORA-00904 exception. In this case the DEPTNO column is completely removed from the projection and Oracle tells us that it doesn't exist in this scope. Similarly, we cannot include any column(s) used in the pivot_clause, as the following example demonstrates.
SQL> SELECT sal 2 FROM emp 3 PIVOT (SUM(sal) 4 FOR deptno IN (10,20,30,40));
SELECT sal * ERROR at line 1: ORA-00904: "SAL": invalid identifier
We attempted to project the SAL column but Oracle raised the same exception. This is also to be expected: the pivot_clause defines our aggregations. This also means, of course, that we must use aggregate functions in the pivot_clause. In the following example, we will attempt to define a pivot_clause with a single-group column.
SQL> SELECT * 2 FROM emp 3 PIVOT (sal 4 FOR deptno IN (10,20,30,40));
PIVOT (sal AS salaries * ERROR at line 3: ORA-56902: expect aggregate function inside pivot operation
Oracle raises a new ORA-56902 exception: the error message numbers are getting much higher with every release!
execution plans for pivot operations
As we have stated, pivot operations imply a GROUP BY, but we don't need to specify it. We can investigate this by explaining one of our pivot query examples, as follows. We will use Autotrace for convenience (Autotrace uses EXPLAIN PLAN and DBMS_XPLAN to display theoretical execution plans).
SQL> set autotrace traceonly explain SQL> SELECT * 2 FROM pivot_data 3 PIVOT (SUM(sal) 4 FOR deptno IN (10 AS d10_sal, 5 20 AS d20_sal, 6 30 AS d30_sal, 7 40 AS d40_sal));
Execution Plan
----------------------------------------------------------
Plan hash value: 1475541029
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5 | 75 | 4 (25)| 00:00:01 |
| 1 | HASH GROUP BY PIVOT| | 5 | 75 | 4 (25)| 00:00:01 |
| 2 | TABLE ACCESS FULL | EMP | 14 | 210 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------
The plan output tells us that this query uses a HASH GROUP BY PIVOT operation. The HASH GROUP BY is a feature of 10g Release 2, but the PIVOT extension is new to 11g. Pivot queries do not automatically generate a PIVOT plan, however. In the following example, we will limit the domain of values in our pivot_in_clause and use Autotrace to explain the query again.
SQL> SELECT * 2 FROM pivot_data 3 PIVOT (SUM(sal) AS sum 4 , COUNT(sal) AS cnt 5 FOR (deptno,job) IN ((30, 'SALESMAN') AS d30_sls, 6 (30, 'MANAGER') AS d30_mgr, 7 (30, 'CLERK') AS d30_clk));
Execution Plan ---------------------------------------------------------- Plan hash value: 1190005124 ---------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 78 | 3 (0)| 00:00:01 | | 1 | VIEW | | 1 | 78 | 3 (0)| 00:00:01 | | 2 | SORT AGGREGATE | | 1 | 15 | | | | 3 | TABLE ACCESS FULL| EMP | 14 | 210 | 3 (0)| 00:00:01 | ----------------------------------------------------------------------------
This time the CBO has costed a simple aggregation over a group by with pivot. It has correctly identified that only one record will be returned from this query, so the GROUP BY operation is unnecessary. Finally, we will explain our first pivot example but use the extended formatting options of DBMS_XPLAN to reveal more information about the work that Oracle is doing.
SQL> EXPLAIN PLAN SET STATEMENT_ID = 'PIVOT' 2 FOR 3 SELECT * 4 FROM pivot_data 5 PIVOT (SUM(sal) 6 FOR deptno IN (10 AS d10_sal, 7 20 AS d20_sal, 8 30 AS d30_sal, 9 40 AS d40_sal));
Explained.
SQL> SELECT * 2 FROM TABLE( 3 DBMS_XPLAN.DISPLAY( 4 NULL, 'PIVOT', 'TYPICAL +PROJECTION'));
PLAN_TABLE_OUTPUT ---------------------------------------------------------------------------- Plan hash value: 1475541029 ---------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 5 | 75 | 4 (25)| 00:00:01 | | 1 | HASH GROUP BY PIVOT| | 5 | 75 | 4 (25)| 00:00:01 | | 2 | TABLE ACCESS FULL | EMP | 14 | 210 | 3 (0)| 00:00:01 | ---------------------------------------------------------------------------- Column Projection Information (identified by operation id): ----------------------------------------------------------- 1 - (#keys=1) "JOB"[VARCHAR2,9], SUM(CASE WHEN ("DEPTNO"=10) THEN "SAL" END )[22], SUM(CASE WHEN ("DEPTNO"=20) THEN "SAL" END )[22], SUM(CASE WHEN ("DEPTNO"=30) THEN "SAL" END )[22], SUM(CASE WHEN ("DEPTNO"=40) THEN "SAL" END )[22] 2 - "JOB"[VARCHAR2,9], "SAL"[NUMBER,22], "DEPTNO"[NUMBER,22] 18 rows selected.
DBMS_XPLAN optionally exposes the column projection information contained in PLAN_TABLE for each step of a query. The projection for ID=2 shows the base columns that we select in the PIVOT_DATA view over EMP. The interesting information, however, is for ID=1 (this step is our pivot operation). This clearly shows how Oracle is generating the pivot columns. Many developers will be familiar with this form of SQL: it is how we write pivot queries in versions prior to 11g. Oracle has chosen a CASE expression, but we commonly use DECODE for brevity, as follows.
SQL> SELECT job 2 , SUM(DECODE(deptno,10,sal)) AS "D10_SAL" 3 , SUM(DECODE(deptno,20,sal)) AS "D20_SAL" 4 , SUM(DECODE(deptno,30,sal)) AS "D30_SAL" 5 , SUM(DECODE(deptno,40,sal)) AS "D40_SAL" 6 FROM emp 7 GROUP BY 8 job;
JOB D10_SAL D20_SAL D30_SAL D40_SAL --------- ---------- ---------- ---------- ---------- CLERK 1430 2090 1045 SALESMAN 6160 PRESIDENT 5500 MANAGER 2695 3272.5 3135 ANALYST 6600 5 rows selected.
pivot performance
From the evidence we have seen, it appears as though Oracle implements the new PIVOT syntax using a recognised SQL format. It follows that we should expect the same performance for our pivot queries regardless of the technique we use (in other words the 11g PIVOT syntax will perform the sam