MySQL基礎 開窗函式
阿新 • • 發佈:2022-04-15
目錄
mysql語法
資料準備
create table emp ( empno numeric(4) not null, ename varchar(10), job varchar(9), mgr numeric(4), hiredate datetime, sal numeric(7, 2), comm numeric(7, 2), deptno numeric(2) ); insert into emp values (7369, 'SMITH', 'CLERK', 7902, '1980-12-17', 800, null, 20); insert into emp values (7499, 'ALLEN', 'SALESMAN', 7698, '1981-02-20', 1600, 300, 30); insert into emp values (7521, 'WARD', 'SALESMAN', 7698, '1981-02-22', 1250, 500, 30); insert into emp values (7566, 'JONES', 'MANAGER', 7839, '1981-04-02', 2975, null, 20); insert into emp values (7654, 'MARTIN', 'SALESMAN', 7698, '1981-09-28', 1250, 1400, 30); insert into emp values (7698, 'BLAKE', 'MANAGER', 7839, '1981-05-01', 2850, null, 30); insert into emp values (7782, 'CLARK', 'MANAGER', 7839, '1981-06-09', 2450, null, 10); insert into emp values (7788, 'SCOTT', 'ANALYST', 7566, '1982-12-09', 3000, null, 20); insert into emp values (7839, 'KING', 'PRESIDENT', null, '1981-11-17', 5000, null, 10); insert into emp values (7844, 'TURNER', 'SALESMAN', 7698, '1981-09-08', 1500, 0, 30); insert into emp values (7876, 'ADAMS', 'CLERK', 7788, '1983-01-12', 1100, null, 20); insert into emp values (7900, 'JAMES', 'CLERK', 7698, '1981-12-03', 950, null, 30); insert into emp values (7902, 'FORD', 'ANALYST', 7566, '1981-12-03', 3000, null, 20); insert into emp values (7934, 'MILLER', 'CLERK', 7782, '1982-01-23', 1300, null, 10);
1.聚合函式(分組函式)
1.聚合統計邏輯
聚合統計:
group by => 分組
xianyu,<1,a,xc,asd>
lxy,<as,zxf,zxf,qwr,ags>
聚合函式 => 指標
xianyu,4
lxy,5
2.函式使用
group by =》 分組 聚合函式 =》 指標統計 sun avg max min count 需求: 統計每個部門有多少個人? 查什麼? 維度:部門 指標:人數 select deptno, count(1) as cnt from emp group by deptno; 解釋: count(1) 【1.代表 先放置一個假數,然後再查詢】 【2.理解為按照第幾個欄位進行查數】 select select + 函式 => 可以校驗函式是否存在
2.開窗函式
1.語法
視窗函式: 視窗 + 函式 視窗:函式執行時 計算資料集的範圍 函式:執行時的函式 1.聚合函式 sun avg max min count 2.內建視窗函式 語法結構: 函式 over([partition by xxx,...] [order by xxx,...]) over() 是以誰進行開窗【table or 資料集】 partition by:以誰進行分組 【group by column】 order by:以誰進行排序【column】
2.聚合函式:多行資料 按照一定規則 進行聚合 為一行
sum avg max...
理論上:聚合後的行數 <= 聚合前的行數 【主要是看維度選取 group by 裡面的欄位】
需求:
既要顯示 聚合前的資料 又要顯示 聚合後的資料 ?
id name sal dt sal_all
1 zs 1000 2022-4 1000
2 ls 2000 2022-4 2000
3 wu 3000 2022-4 3000
4 zs 1000 2022-5 2000
5 ls 2000 2022-5 4000
6 wu 3000 2022-5 6000
資料:
伺服器 每天的啟動 次數
linux01,2022-04-15,1
linux01,2022-04-16,5
linux01,2022-04-17,7
linux01,2022-04-18,2
linux01,2022-04-19,3
linux01,2022-04-20,10
linux01,2022-04-21,4
統計累計問題:
建立表
create table window01(
name varchar(50),
dt varchar(20),
cnt int
);
插入資料
insert into window01 values("linux01","2022-04-15",1);
insert into window01 values("linux01","2022-04-16",5);
insert into window01 values("linux01","2022-04-17",7);
insert into window01 values("linux01","2022-04-18",2);
insert into window01 values("linux01","2022-04-19",3);
insert into window01 values("linux01","2022-04-20",10);
insert into window01 values("linux01","2022-04-21",4);
insert into window01 values("linux02","2022-04-18",20);
insert into window01 values("linux02","2022-04-19",30);
insert into window01 values("linux02","2022-04-20",10);
insert into window01 values("linux02","2022-04-21",40);
需求:
每個伺服器 每天 累積啟動次數
select
name,
dt,
cnt,
sum(cnt) over(partition by name order by dt) as cut_all
from window01;
+---------+------------+------+---------+
| name | dt | cnt | cut_all |
+---------+------------+------+---------+
| linux01 | 2022-04-15 | 1 | 1 |
| linux01 | 2022-04-16 | 5 | 6 |
| linux01 | 2022-04-17 | 7 | 13 |
| linux01 | 2022-04-18 | 2 | 15 |
| linux01 | 2022-04-19 | 3 | 18 |
| linux01 | 2022-04-20 | 10 | 28 |
| linux01 | 2022-04-21 | 4 | 32 |
| linux02 | 2022-04-18 | 20 | 20 |
| linux02 | 2022-04-19 | 30 | 50 |
| linux02 | 2022-04-20 | 10 | 60 |
| linux02 | 2022-04-21 | 40 | 100 |
+---------+------------+------+---------+
1 9 10 11 str 【字典序】
1 10 11 9
* 從1開始,1,2,3,4,5,6,7,8,9
* 從10開始,1,10…19,2,3,4,5,6,7,8,9
* 從20開始,1,10…19,2,20…29,3,4,5,6,7,8,9
* 以此類推,所有的10位數,都插入到與他們十位數位置上相等的個位數後面。
3.內建視窗函式
視窗大小 xxx between xxx and xxx
引數
(ROWS | RANGE) BETWEEN (UNBOUNDED | [num]) PRECEDING AND ([num] PRECEDING | CURRENT ROW | (UNBOUNDED | [num]) FOLLOWING)
(ROWS | RANGE) BETWEEN CURRENT ROW AND (CURRENT ROW | (UNBOUNDED | [num]) FOLLOWING)
(ROWS | RANGE) BETWEEN [num] FOLLOWING AND (UNBOUNDED | [num]) FOLLOWING
select
name,
dt,
cnt,
sum(cnt) over(partition by name order by dt) as cut_all,
-- 無邊界
sum(cnt) over(partition by name order by dt rows between unbounded preceding and current row) as cut_all2,
-- 前三行 + 當前行
sum(cnt) over(partition by name order by dt rows between 3 preceding and current row) as cut_all3,
-- 前三行 + 當前行 + 下一行
sum(cnt) over(partition by name order by dt rows between 3 preceding and 1 following) as cut_all4,
-- 上面無邊界 + 下面無邊界
sum(cnt) over(partition by name order by dt rows between unbounded preceding and UNBOUNDED FOLLOWING) as cut_all5
from window01;
select
name,
dt,
cnt,
-- 常規分組排序求加和
sum(cnt) over(partition by name order by dt) as cut_all,
-- 整張表對時間排序,然後加和,作用到整張表,理解為18號並列有兩條資料
sum(cnt) over(order by dt) as cut_all2,
-- 對整張表進行加和
sum(cnt) over() as cut_all3,
-- 直接按照名字進分組
sum(cnt) over(partition by name) as cut_all4
from window01
order by dt;
1.partition by 不加 => 作用整張表
數倉順序
ods不動
union all + group by select ifnull case when
join
group by
grouping sets 【維度組合】
4.內建視窗函式
1.取值 序列
1.序列
LAG 【視窗內 向上 第n行的值 當前行向上取一行】
LAG(column [, N[, default]])
column => 列名
n => 取幾行
default => 取不到給預設值
LEAD 【視窗內 向下 第n行的值 當前行向下取一行】
select
name,
dt,
cnt,
sum(cnt) over(partition by name order by dt) as cut_all,
lead(dt,1,"9999-99-99") over(partition by name order by dt) as lead_alias,
lead(dt,1,"9999-99-99") over(partition by name order by dt) as lag_alias
from window01;
2.取值
FIRST_VALUE() : 取分組內排序後 截止到當前行 第一個值
LAST_VALUE():取分組內排序後 截止到當前行 最後一個值
select
name,
dt,
cnt,
first_value(cnt) over(partition by name order by dt) as f_value,
last_value(cnt) over(partition by name order by dt) as l_value
from window01;
2.排序
分組
ntile
需求:
把資料按照某個欄位進行排序,把資料分成n份ntile(n)
如果不能平均分配 優先分配到編號小的裡面
select
name,
dt,
cnt,
sum(cnt) over(partition by name order by dt) as cut_all,
-- 平均分成n份,不能平均分,優先把多餘的放到最小的裡面
ntile(2) over(partition by name order by dt) as n2,
ntile(3) over(partition by name order by dt) as n3
from window01
order by dt;
排序
rank : 從1開始,按照排序 相同會重複,名次會留下空位 生成組內的記錄編號
row_number: 從1開始,按照排序 生成組內的記錄編號
dense_rank:從1開始,按照排序 相同會重複,名次不會留下空位 生成組內的記錄編號
select
name,
dt,
cnt,
sum(cnt) over(partition by name order by dt) as cut_all,
rank() over(partition by name order by cnt desc) as rk,
row_number() over(partition by name order by cnt desc) as rw,
dense_rank() over(partition by name order by cnt desc) as d_rk
from window01;