如何使用SQL視窗子句減少語法開銷
SQL是一種冗長的語言,其中最冗長的特性之一是視窗函式.
在.最近遇到的堆疊溢位問題,有人要求計算某一特定日期的時間序列中的第一個值和最後一個值之間的差額:
輸入
volume tstamp
---------------------------
29011 2012-12-28 09:00:00
28701 2012-12-28 10:00:00
28830 2012-12-28 11:00:00
28353 2012-12-28 12:00:00
28642 2012-12-28 13:00:00
28583 2012-12-28 14:00:00
28800 2012-12-29 09:00:00
28751 2012-12-29 10:00:00
28670 2012-12-29 11:00:00
28621 2012-12-29 12:00:00
28599 2012-12-29 13:00:00
28278 2012-12-29 14:00:00
期望輸出
first last difference date
------------------------------------
29011 28583 428 2012-12-28
28800 28278 522 2012-12-29
如何編寫查詢
請注意,值和時間戳級數可能不相關。所以,沒有一條規定如果Timestamp2 > Timestamp1
然後Value2 < Value1
。否則,這個簡單的查詢就能工作(使用PostgreSQL語法):
SELECT
max(volume) AS first,
min(volume) AS last,
max(volume) - min(volume) AS difference,
CAST(tstamp AS DATE) AS date
FROM t
GROUP BY CAST(tstamp AS DATE);
有幾種方法可以在不涉及視窗函式的組中找到第一個和最後一個值。例如:
在Oracle中,可以使用第一和最後函式,由於某些神祕原因,這些函式沒有編寫。
FIRST(...) WITHIN GROUP (ORDER BY ...)
或LAST(...) WITHIN GROUP (ORDER BY ...)
,與其他排序集聚合函式一樣,但是some_aggregate_function(...) KEEP (DENSE_RANK FIRST ORDER BY ...)
。圍棋數字在PostgreSQL中,可以使用
DISTINCT ON
語法與ORDER BY
和LIMIT
有關各種方法的更多細節可以在這裡找到:
https://blog.jooq.org/2017/09/22/how-to-write-efficient-top-n-queries-in-sql
最好的方法是使用像Oracle這樣的聚合函式,但是很少有資料庫具有這種功能。所以,我們將使用FIRST_VALUE
和LAST_VALUE
視窗函式:
SELECT DISTINCT
first_value(volume) OVER (
PARTITION BY CAST(tstamp AS DATE)
ORDER BY tstamp
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS first,
last_value(volume) OVER (
PARTITION BY CAST(tstamp AS DATE)
ORDER BY tstamp
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS last,
first_value(volume) OVER (
PARTITION BY CAST(tstamp AS DATE)
ORDER BY tstamp
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)
- last_value(volume) OVER (
PARTITION BY CAST(tstamp AS DATE)
ORDER BY tstamp
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS diff,
CAST(tstamp AS DATE) AS date
FROM t
ORDER BY CAST(tstamp AS DATE)
哎呀。
看上去不太容易讀。但它將產生正確的結果。當然,我們可以包裝列的定義。FIRST
和LAST
在派生表中,但這仍然會給我們留下兩次視窗定義的重複:
PARTITION BY CAST(tstamp AS DATE)
ORDER BY tstamp
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
援救視窗條款
幸運的是,至少有3個數據庫實現了sql標準。WINDOW
條款:
MySQL
PostgreSQL
Sybase SQL Anywhere
上面的查詢可以重構為這個查詢:
SELECT DISTINCT
first_value(volume) OVER w AS first,
last_value(volume) OVER w AS last,
first_value(volume) OVER w
- last_value(volume) OVER w AS diff,
CAST(tstamp AS DATE) AS date
FROM t
WINDOW w AS (
PARTITION BY CAST(tstamp AS DATE)
ORDER BY tstamp
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)
ORDER BY CAST(tstamp AS DATE)
請注意,如何使用視窗規範來指定視窗名稱,就像定義公共表示式一樣(WITH
條款):
WINDOW
<window-name> AS (<window-specification>)
{ ,<window-name> AS (<window-specification>)... }
我不僅可以重用整個規範,還可以根據部分規範構建規範,並且只重用部分規範。我以前的查詢可以這樣重寫:
SELECT DISTINCT
first_value(volume) OVER w3 AS first,
last_value(volume) OVER w3 AS last,
first_value(volume) OVER w3
- last_value(volume) OVER w3 AS diff,
CAST(tstamp AS DATE) AS date
FROM t
WINDOW
w1 AS (PARTITION BY CAST(tstamp AS DATE)),
w2 AS (w1 ORDER BY tstamp),
w3 AS (w2 ROWS BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING)
ORDER BY CAST(tstamp AS DATE)
每個視窗規範可以從頭建立,也可以基於先前定義的視窗規範。注在引用視窗定義時也是如此。如果我想重用PARTITION BY
條款和ORDER BY
子句,但請更改FRAME
條款(ROWS ...
),那麼我就可以這樣寫了:
SELECT DISTINCT
first_value(volume) OVER (
w2 ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS first,
last_value(volume) OVER (
w2 ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
) AS last,
first_value(volume) OVER (
w2 ROWS UNBOUNDED PRECEDING
) - last_value(volume) OVER (
w2 ROWS BETWEEN 1 PRECEDING AND UNBOUNDED FOLLOWING
) AS diff,
CAST(tstamp AS DATE) AS date
FROM t
WINDOW
w1 AS (PARTITION BY CAST(tstamp AS DATE)),
w2 AS (w1 ORDER BY tstamp)
ORDER BY CAST(tstamp AS DATE)
如果我的資料庫不支援視窗條款呢?
在這種情況下,您必須手動為每個視窗函式編寫視窗規範,或者使用像jOOQ這樣的SQL構建器,它可以模擬Window子句: