1. 程式人生 > 實用技巧 >mysql8學習筆記⑨視窗函式

mysql8學習筆記⑨視窗函式

前言

MySQL8.0之前,做資料排名統計等相當痛苦,因為沒有像Oracle、SQL SERVER 、PostgreSQL等其他資料庫那樣的視窗函式。但隨著MySQL8.0中新增了視窗函式之後,針對這類統計就再也不是事了,本文就以常用的排序例項介紹MySQL的視窗函式。

1、準備工作

建立表及測試資料

mysql> create database testdb;
Database changed
/* 建立表 */
 create table tb_score(id int primary key auto_increment,stu_no varchar(10),course varchar(50),score decimal
(4,1),key idx_stuNo_course(stu_no,course)); mysql> show tables; +------------------+ | Tables_in_testdb | +------------------+ | tb_score | +------------------+ /* 新增一批測試資料 */ insert into tb_score(stu_no,course,score)values('2020001','mysql',90),('2020001','C++',85),('2020003','English',100),('2020002','mysql',50),('
2020002','C++',70),('2020002','English',99); insert into tb_score(stu_no,course,score)values('2020003','mysql',78),('2020003','C++',81),('2020003','English',80),('2020004','mysql',80),('2020004','C++',60),('2020004','English',100); insert into tb_score(stu_no,course,score)values('2020005','mysql',98),('2020005','C++',96),('2020005','English',70),('2020006
','mysql',60),('2020006','C++',90),('2020006','English',70); insert into tb_score(stu_no,course,score)values('2020007','mysql',50),('2020007','C++',66),('2020007','English',76),('2020008','mysql',90),('2020008','C++',69),('2020008','English',86); insert into tb_score(stu_no,course,score)values('2020009','mysql',70),('2020009','C++',66),('2020009','English',86),('2020010','mysql',75),('2020010','C++',76),('2020010','English',81); insert into tb_score(stu_no,course,score)values('2020011','mysql',90),('2020012','C++',85),('2020011','English',84),('2020012','English',75),('2020013','C++',96),('2020013','English',88);

2、統計每門課程分數的排名

根據每門課程的分數從高到低進行排名,此時,會出現分數相同時怎麼處理的問題,下面就根據不同的視窗函式來處理不同場景的需求

ROW_NUMBER

由結果可以看出,分數相同時按照學號順序進行排名

mysql> select stu_no,course,score,row_number() over(PARTITION by course order by score desc) as rn from tb_score;
+---------+---------+-------+----+
| stu_no  | course  | score | rn |
+---------+---------+-------+----+
| 2020005 | C++     | 96.0  |  1 |
| 2020013 | C++     | 96.0  |  2 |
| 2020006 | C++     | 90.0  |  3 |
| 2020001 | C++     | 85.0  |  4 |
| 2020012 | C++     | 85.0  |  5 |
| 2020003 | C++     | 81.0  |  6 |
| 2020010 | C++     | 76.0  |  7 |
| 2020002 | C++     | 70.0  |  8 |
| 2020008 | C++     | 69.0  |  9 |
| 2020007 | C++     | 66.0  | 10 |
| 2020009 | C++     | 66.0  | 11 |
| 2020004 | C++     | 60.0  | 12 |
| 2020003 | English | 100.0 |  1 |
| 2020004 | English | 100.0 |  2 |
| 2020002 | English | 99.0  |  3 |
| 2020013 | English | 88.0  |  4 |
| 2020008 | English | 86.0  |  5 |
| 2020009 | English | 86.0  |  6 |
| 2020011 | English | 84.0  |  7 |
| 2020010 | English | 81.0  |  8 |
| 2020003 | English | 80.0  |  9 |
| 2020007 | English | 76.0  | 10 |
| 2020012 | English | 75.0  | 11 |
| 2020005 | English | 70.0  | 12 |
| 2020006 | English | 70.0  | 13 |
| 2020005 | mysql   | 98.0  |  1 |
| 2020001 | mysql   | 90.0  |  2 |
| 2020008 | mysql   | 90.0  |  3 |
| 2020011 | mysql   | 90.0  |  4 |
| 2020004 | mysql   | 80.0  |  5 |
| 2020003 | mysql   | 78.0  |  6 |
| 2020010 | mysql   | 75.0  |  7 |
| 2020009 | mysql   | 70.0  |  8 |
| 2020006 | mysql   | 60.0  |  9 |
| 2020002 | mysql   | 50.0  | 10 |
| 2020007 | mysql   | 50.0  | 11 |
+---------+---------+-------+----+

DENSE_RANK

為了讓分數相同時排名也相同,則可以使用DENSE_RANK函式,結果如下:

mysql> select stu_no,course,score,DENSE_RANK() over(partition by course order by score desc) rn from tb_score;
+---------+---------+-------+----+
| stu_no  | course  | score | rn |
+---------+---------+-------+----+
| 2020005 | C++     | 96.0  |  1 |
| 2020013 | C++     | 96.0  |  1 |
| 2020006 | C++     | 90.0  |  2 |
| 2020001 | C++     | 85.0  |  3 |
| 2020012 | C++     | 85.0  |  3 |
| 2020003 | C++     | 81.0  |  4 |
| 2020010 | C++     | 76.0  |  5 |
| 2020002 | C++     | 70.0  |  6 |
| 2020008 | C++     | 69.0  |  7 |
| 2020007 | C++     | 66.0  |  8 |
| 2020009 | C++     | 66.0  |  8 |
| 2020004 | C++     | 60.0  |  9 |
| 2020003 | English | 100.0 |  1 |
| 2020004 | English | 100.0 |  1 |
| 2020002 | English | 99.0  |  2 |
| 2020013 | English | 88.0  |  3 |
| 2020008 | English | 86.0  |  4 |
| 2020009 | English | 86.0  |  4 |
| 2020011 | English | 84.0  |  5 |
| 2020010 | English | 81.0  |  6 |
| 2020003 | English | 80.0  |  7 |
| 2020007 | English | 76.0  |  8 |
| 2020012 | English | 75.0  |  9 |
| 2020005 | English | 70.0  | 10 |
| 2020006 | English | 70.0  | 10 |
| 2020005 | mysql   | 98.0  |  1 |
| 2020001 | mysql   | 90.0  |  2 |
| 2020008 | mysql   | 90.0  |  2 |
| 2020011 | mysql   | 90.0  |  2 |
| 2020004 | mysql   | 80.0  |  3 |
| 2020003 | mysql   | 78.0  |  4 |
| 2020010 | mysql   | 75.0  |  5 |
| 2020009 | mysql   | 70.0  |  6 |
| 2020006 | mysql   | 60.0  |  7 |
| 2020002 | mysql   | 50.0  |  8 |
| 2020007 | mysql   | 50.0  |  8 |
+---------+---------+-------+----+

RANK

DENSE_RANK的結果是分數相同時排名相同了,但是下一個名次是緊接著上一個名次的,如果2個並列的第1之後,下一個我想是第3名,則可以使用RANK函式實現

mysql> select stu_no,course,score,rank() over(partition by course order by score desc) rn from tb_score;
+---------+---------+-------+----+
| stu_no  | course  | score | rn |
+---------+---------+-------+----+
| 2020005 | C++     | 96.0  |  1 |
| 2020013 | C++     | 96.0  |  1 |
| 2020006 | C++     | 90.0  |  3 |
| 2020001 | C++     | 85.0  |  4 |
| 2020012 | C++     | 85.0  |  4 |
| 2020003 | C++     | 81.0  |  6 |
| 2020010 | C++     | 76.0  |  7 |
| 2020002 | C++     | 70.0  |  8 |
| 2020008 | C++     | 69.0  |  9 |
| 2020007 | C++     | 66.0  | 10 |
| 2020009 | C++     | 66.0  | 10 |
| 2020004 | C++     | 60.0  | 12 |
| 2020003 | English | 100.0 |  1 |
| 2020004 | English | 100.0 |  1 |
| 2020002 | English | 99.0  |  3 |
| 2020013 | English | 88.0  |  4 |
| 2020008 | English | 86.0  |  5 |
| 2020009 | English | 86.0  |  5 |
| 2020011 | English | 84.0  |  7 |
| 2020010 | English | 81.0  |  8 |
| 2020003 | English | 80.0  |  9 |
| 2020007 | English | 76.0  | 10 |
| 2020012 | English | 75.0  | 11 |
| 2020005 | English | 70.0  | 12 |
| 2020006 | English | 70.0  | 12 |
| 2020005 | mysql   | 98.0  |  1 |
| 2020001 | mysql   | 90.0  |  2 |
| 2020008 | mysql   | 90.0  |  2 |
| 2020011 | mysql   | 90.0  |  2 |
| 2020004 | mysql   | 80.0  |  5 |
| 2020003 | mysql   | 78.0  |  6 |
| 2020010 | mysql   | 75.0  |  7 |
| 2020009 | mysql   | 70.0  |  8 |
| 2020006 | mysql   | 60.0  |  9 |
| 2020002 | mysql   | 50.0  | 10 |
| 2020007 | mysql   | 50.0  | 10 |
+---------+---------+-------+----+

這樣就實現了各種排序需求。

NTILE

NTILE函式的作用是對每個分組排名後,再將對應分組分成N個小組,例如

mysql> select stu_no,course,score,rank() over(partition by course order by score desc) rn,NTILE(2) over(partition by course order by score desc) rn_group from tb_score;
+---------+---------+-------+----+----------+
| stu_no  | course  | score | rn | rn_group |
+---------+---------+-------+----+----------+
| 2020005 | C++     | 96.0  |  1 |        1 |
| 2020013 | C++     | 96.0  |  1 |        1 |
| 2020006 | C++     | 90.0  |  3 |        1 |
| 2020001 | C++     | 85.0  |  4 |        1 |
| 2020012 | C++     | 85.0  |  4 |        1 |
| 2020003 | C++     | 81.0  |  6 |        1 |
| 2020010 | C++     | 76.0  |  7 |        2 |
| 2020002 | C++     | 70.0  |  8 |        2 |
| 2020008 | C++     | 69.0  |  9 |        2 |
| 2020007 | C++     | 66.0  | 10 |        2 |
| 2020009 | C++     | 66.0  | 10 |        2 |
| 2020004 | C++     | 60.0  | 12 |        2 |
| 2020003 | English | 100.0 |  1 |        1 |
| 2020004 | English | 100.0 |  1 |        1 |
| 2020002 | English | 99.0  |  3 |        1 |
| 2020013 | English | 88.0  |  4 |        1 |
| 2020008 | English | 86.0  |  5 |        1 |
| 2020009 | English | 86.0  |  5 |        1 |
| 2020011 | English | 84.0  |  7 |        1 |
| 2020010 | English | 81.0  |  8 |        2 |
| 2020003 | English | 80.0  |  9 |        2 |
| 2020007 | English | 76.0  | 10 |        2 |
| 2020012 | English | 75.0  | 11 |        2 |
| 2020005 | English | 70.0  | 12 |        2 |
| 2020006 | English | 70.0  | 12 |        2 |
| 2020005 | mysql   | 98.0  |  1 |        1 |
| 2020001 | mysql   | 90.0  |  2 |        1 |
| 2020008 | mysql   | 90.0  |  2 |        1 |
| 2020011 | mysql   | 90.0  |  2 |        1 |
| 2020004 | mysql   | 80.0  |  5 |        1 |
| 2020003 | mysql   | 78.0  |  6 |        1 |
| 2020010 | mysql   | 75.0  |  7 |        2 |
| 2020009 | mysql   | 70.0  |  8 |        2 |
| 2020006 | mysql   | 60.0  |  9 |        2 |
| 2020002 | mysql   | 50.0  | 10 |        2 |
| 2020007 | mysql   | 50.0  | 10 |        2 |
+---------+---------+-------+----+----------+

-- 視窗函式

-- row_number,rank,dense_rank之間的區別

with test(study_name,class_name,score) as(
select 'sqlercn','mysql',95
union all
select 'tom','mysql',99
union all
select 'jerry','mysql',99
union all
select 'gavin','mysql',98
union all
select 'sqlercn','postgresql',99
union all
select 'tom','postgresql',99
union all
select 'jerry','postgresql',98
)
select study_name,class_name,score
            ,row_number() over(partition by class_name order by score desc) as rw
            ,rank() over(partition by class_name order by score desc) as rk
            ,dense_rank() over(partition by class_name order by score desc) as drk
from test
order by class_name,rw;
排名顯示的方式不同

-- 按學習人數對課程進行排名,並列出每類課程學習人數排名前3的課程名稱,學習人數以及名次

with tmp as(
select class_name,title,score
        ,rank() over(partition by class_name order by score desc) as cnt
from imc_course a
join imc_class b on a.class_id = b.class_id
)
select * from tmp where cnt<=3;


-- 每門課程的學習人數佔奔雷課程總學習人數的百分比

with tmp as(
select class_name,title,study_cnt
        ,sum(study_cnt) over(partition by class_name) as class_total
from imc_course a
join imc_class b on b.class_id = a.class_id
)
select class_name,title,concat(study_cnt/class_total*100,'%')
from tmp
order by class_name;

-- 學習人數等於1000人的課程有哪些,列出他們的課程標題和學習人數

select title,study_cnt
from imc_course
where study_cnt = 1000;

-- 學習人數大於1000人的課程有哪些,列出他們的課程標題和學習人數

select title,study_cnt
from imc_course
where study_cnt > 1000;

開發sql容易發生的問題

-- 查詢出分類ID為5的課程名稱和分類名稱

錯誤一:
在on中使用and進行過濾
select a.title,b.class_name
from imc_course a
join imc_class b on a.class_id = b.class_id and a.class_id=5;

把內連線變更為左外連線,起不到過濾出我們需要的資料的效果
select a.title,b.class_name
from imc_course a 
left join imc_class b on a.class_id = b.class_id and a.class_id=5

使用where條件語句就沒有這樣的問題
select a.title,b.class_name
from imc_course a 
left join imc_class b on a.class_id = b.class_id
where b.class_id=5;

select *
from imc_course
where title in (select title from imc_class);

如何避免

這樣就實現了各種排序需求。

NTILE

NTILE函式的作用是對每個分組排名後,再將對應分組分成N個小組,例如