1. 程式人生 > 其它 >MySQL基礎 開窗函式

MySQL基礎 開窗函式

目錄

mysql語法

資料準備

create table emp (
    empno numeric(4) not null,
    ename varchar(10),
    job varchar(9),
    mgr numeric(4),
    hiredate datetime,
    sal numeric(7, 2),
    comm numeric(7, 2),
    deptno numeric(2)
);

insert into emp values (7369, 'SMITH', 'CLERK', 7902, '1980-12-17', 800, null, 20);
insert into emp values (7499, 'ALLEN', 'SALESMAN', 7698, '1981-02-20', 1600, 300, 30);
insert into emp values (7521, 'WARD', 'SALESMAN', 7698, '1981-02-22', 1250, 500, 30);
insert into emp values (7566, 'JONES', 'MANAGER', 7839, '1981-04-02', 2975, null, 20);
insert into emp values (7654, 'MARTIN', 'SALESMAN', 7698, '1981-09-28', 1250, 1400, 30);
insert into emp values (7698, 'BLAKE', 'MANAGER', 7839, '1981-05-01', 2850, null, 30);
insert into emp values (7782, 'CLARK', 'MANAGER', 7839, '1981-06-09', 2450, null, 10);
insert into emp values (7788, 'SCOTT', 'ANALYST', 7566, '1982-12-09', 3000, null, 20);
insert into emp values (7839, 'KING', 'PRESIDENT', null, '1981-11-17', 5000, null, 10);
insert into emp values (7844, 'TURNER', 'SALESMAN', 7698, '1981-09-08', 1500, 0, 30);
insert into emp values (7876, 'ADAMS', 'CLERK', 7788, '1983-01-12', 1100, null, 20);
insert into emp values (7900, 'JAMES', 'CLERK', 7698, '1981-12-03', 950, null, 30);
insert into emp values (7902, 'FORD', 'ANALYST', 7566, '1981-12-03', 3000, null, 20);
insert into emp values (7934, 'MILLER', 'CLERK', 7782, '1982-01-23', 1300, null, 10);

1.聚合函式(分組函式)

1.聚合統計邏輯

    聚合統計:
        group by => 分組
            xianyu,<1,a,xc,asd>
            lxy,<as,zxf,zxf,qwr,ags>
        聚合函式 => 指標
            xianyu,4
            lxy,5

2.函式使用

group by =》 分組
聚合函式 =》 指標統計 sun avg max min count


需求:
    統計每個部門有多少個人?

    查什麼?
        維度:部門
        指標:人數        

        select
        deptno,
        count(1) as cnt
        from emp
        group by deptno;

解釋:
    count(1) 【1.代表 先放置一個假數,然後再查詢】
        【2.理解為按照第幾個欄位進行查數】
    select
        select + 函式 => 可以校驗函式是否存在

2.開窗函式

1.語法

視窗函式:
    視窗 + 函式
    視窗:函式執行時 計算資料集的範圍
    函式:執行時的函式
        1.聚合函式
            sun avg max min count
        2.內建視窗函式
        
    語法結構:
        函式 over([partition by xxx,...] [order by xxx,...])
        over() 是以誰進行開窗【table or 資料集】
        partition by:以誰進行分組 【group by column】
        order by:以誰進行排序【column】

2.聚合函式:多行資料 按照一定規則 進行聚合 為一行

    sum avg max...
    理論上:聚合後的行數 <= 聚合前的行數 【主要是看維度選取 group by 裡面的欄位】

    需求:
        既要顯示 聚合前的資料 又要顯示 聚合後的資料 ?

        id name sal   dt        sal_all
        1   zs  1000 2022-4     1000
        2   ls  2000 2022-4     2000
        3   wu  3000 2022-4     3000
        4   zs  1000 2022-5     2000
        5   ls  2000 2022-5     4000
        6   wu  3000 2022-5     6000

資料:
伺服器 每天的啟動 次數
linux01,2022-04-15,1
linux01,2022-04-16,5
linux01,2022-04-17,7
linux01,2022-04-18,2
linux01,2022-04-19,3
linux01,2022-04-20,10
linux01,2022-04-21,4

統計累計問題:
    建立表
        create table window01(
            name varchar(50),
            dt varchar(20),
            cnt int
        );
    插入資料
        insert into window01 values("linux01","2022-04-15",1);
        insert into window01 values("linux01","2022-04-16",5);
        insert into window01 values("linux01","2022-04-17",7);
        insert into window01 values("linux01","2022-04-18",2);
        insert into window01 values("linux01","2022-04-19",3);
        insert into window01 values("linux01","2022-04-20",10);
        insert into window01 values("linux01","2022-04-21",4);


        insert into window01 values("linux02","2022-04-18",20);
        insert into window01 values("linux02","2022-04-19",30);
        insert into window01 values("linux02","2022-04-20",10);
        insert into window01 values("linux02","2022-04-21",40);


    需求:
        每個伺服器 每天 累積啟動次數
        select
        name,
        dt,
        cnt,
        sum(cnt) over(partition by name order by dt) as cut_all
        from window01;

        +---------+------------+------+---------+
        | name    | dt         | cnt  | cut_all |
        +---------+------------+------+---------+
        | linux01 | 2022-04-15 |    1 |       1 |
        | linux01 | 2022-04-16 |    5 |       6 |
        | linux01 | 2022-04-17 |    7 |      13 |
        | linux01 | 2022-04-18 |    2 |      15 |
        | linux01 | 2022-04-19 |    3 |      18 |
        | linux01 | 2022-04-20 |   10 |      28 |
        | linux01 | 2022-04-21 |    4 |      32 |
        | linux02 | 2022-04-18 |   20 |      20 |
        | linux02 | 2022-04-19 |   30 |      50 |
        | linux02 | 2022-04-20 |   10 |      60 |
        | linux02 | 2022-04-21 |   40 |     100 |
        +---------+------------+------+---------+

        1 9 10 11 str 【字典序】
        1 10 11 9

         * 從1開始,1,2,3,4,5,6,7,8,9
         * 從10開始,1,10…19,2,3,4,5,6,7,8,9 
         * 從20開始,1,10…19,2,20…29,3,4,5,6,7,8,9
         * 以此類推,所有的10位數,都插入到與他們十位數位置上相等的個位數後面。

3.內建視窗函式

視窗大小 xxx between xxx and xxx

引數
(ROWS | RANGE) BETWEEN (UNBOUNDED | [num]) PRECEDING AND ([num] PRECEDING | CURRENT ROW | (UNBOUNDED | [num]) FOLLOWING)
(ROWS | RANGE) BETWEEN CURRENT ROW AND (CURRENT ROW | (UNBOUNDED | [num]) FOLLOWING)
(ROWS | RANGE) BETWEEN [num] FOLLOWING AND (UNBOUNDED | [num]) FOLLOWING

select
name,
dt,
cnt,
sum(cnt) over(partition by name order by dt) as cut_all,
-- 無邊界
sum(cnt) over(partition by name order by dt rows between unbounded preceding and current row) as cut_all2,
-- 前三行 + 當前行
sum(cnt) over(partition by name order by dt rows between 3 preceding and current row) as cut_all3,
-- 前三行 + 當前行 + 下一行
sum(cnt) over(partition by name order by dt rows between 3 preceding and 1 following) as cut_all4,
-- 上面無邊界 + 下面無邊界
sum(cnt) over(partition by name order by dt rows between unbounded preceding and UNBOUNDED FOLLOWING) as cut_all5
from window01;

select
name,
dt,
cnt,
-- 常規分組排序求加和
sum(cnt) over(partition by name order by dt) as cut_all,
-- 整張表對時間排序,然後加和,作用到整張表,理解為18號並列有兩條資料
sum(cnt) over(order by dt) as cut_all2,
-- 對整張表進行加和
sum(cnt) over() as cut_all3,
-- 直接按照名字進分組
sum(cnt) over(partition by name) as cut_all4
from window01
order by dt;

1.partition by 不加 => 作用整張表


數倉順序
    ods不動
    union all + group by select ifnull case when
    join
    group by
    grouping sets 【維度組合】

4.內建視窗函式

1.取值 序列

1.序列
            LAG 【視窗內 向上 第n行的值 當前行向上取一行】
                LAG(column [, N[, default]])
                column => 列名
                n => 取幾行
                default => 取不到給預設值
            LEAD 【視窗內 向下 第n行的值 當前行向下取一行】

            select
            name,
            dt,
            cnt,
            sum(cnt) over(partition by name order by dt) as cut_all,
            lead(dt,1,"9999-99-99") over(partition by name order by dt) as lead_alias,
            lead(dt,1,"9999-99-99") over(partition by name order by dt) as lag_alias
            from window01;
2.取值
            FIRST_VALUE() : 取分組內排序後 截止到當前行 第一個值
            LAST_VALUE():取分組內排序後 截止到當前行 最後一個值

            select
            name,
            dt,
            cnt,
            first_value(cnt) over(partition by name order by dt) as f_value,
            last_value(cnt) over(partition by name order by dt) as l_value
            from window01;

2.排序

分組
            ntile
            需求:
                把資料按照某個欄位進行排序,把資料分成n份ntile(n)
                如果不能平均分配 優先分配到編號小的裡面
            select
            name,
            dt,
            cnt,
            sum(cnt) over(partition by name order by dt) as cut_all,
            -- 平均分成n份,不能平均分,優先把多餘的放到最小的裡面
            ntile(2) over(partition by name order by dt) as n2,
            ntile(3) over(partition by name order by dt) as n3
            from window01
            order by dt;
排序
            rank : 從1開始,按照排序 相同會重複,名次會留下空位 生成組內的記錄編號
            row_number: 從1開始,按照排序 生成組內的記錄編號
            dense_rank:從1開始,按照排序 相同會重複,名次不會留下空位 生成組內的記錄編號

            select
            name,
            dt,
            cnt,
            sum(cnt) over(partition by name order by dt) as cut_all,
            rank() over(partition by name order by cnt desc) as rk,
            row_number() over(partition by name order by cnt desc) as rw,
            dense_rank() over(partition by name order by cnt desc) as d_rk
            from window01;