mysql8學習筆記⑨視窗函式
阿新 • • 發佈:2020-08-18
前言
MySQL8.0之前,做資料排名統計等相當痛苦,因為沒有像Oracle、SQL SERVER 、PostgreSQL等其他資料庫那樣的視窗函式。但隨著MySQL8.0中新增了視窗函式之後,針對這類統計就再也不是事了,本文就以常用的排序例項介紹MySQL的視窗函式。
1、準備工作
建立表及測試資料
mysql> create database testdb; Database changed /* 建立表 */ create table tb_score(id int primary key auto_increment,stu_no varchar(10),course varchar(50),score decimal(4,1),key idx_stuNo_course(stu_no,course)); mysql> show tables; +------------------+ | Tables_in_testdb | +------------------+ | tb_score | +------------------+ /* 新增一批測試資料 */ insert into tb_score(stu_no,course,score)values('2020001','mysql',90),('2020001','C++',85),('2020003','English',100),('2020002','mysql',50),('2020002','C++',70),('2020002','English',99); insert into tb_score(stu_no,course,score)values('2020003','mysql',78),('2020003','C++',81),('2020003','English',80),('2020004','mysql',80),('2020004','C++',60),('2020004','English',100); insert into tb_score(stu_no,course,score)values('2020005','mysql',98),('2020005','C++',96),('2020005','English',70),('2020006','mysql',60),('2020006','C++',90),('2020006','English',70); insert into tb_score(stu_no,course,score)values('2020007','mysql',50),('2020007','C++',66),('2020007','English',76),('2020008','mysql',90),('2020008','C++',69),('2020008','English',86); insert into tb_score(stu_no,course,score)values('2020009','mysql',70),('2020009','C++',66),('2020009','English',86),('2020010','mysql',75),('2020010','C++',76),('2020010','English',81); insert into tb_score(stu_no,course,score)values('2020011','mysql',90),('2020012','C++',85),('2020011','English',84),('2020012','English',75),('2020013','C++',96),('2020013','English',88);
2、統計每門課程分數的排名
根據每門課程的分數從高到低進行排名,此時,會出現分數相同時怎麼處理的問題,下面就根據不同的視窗函式來處理不同場景的需求
ROW_NUMBER
由結果可以看出,分數相同時按照學號順序進行排名
mysql> select stu_no,course,score,row_number() over(PARTITION by course order by score desc) as rn from tb_score; +---------+---------+-------+----+ | stu_no | course | score | rn | +---------+---------+-------+----+ | 2020005 | C++ | 96.0 | 1 | | 2020013 | C++ | 96.0 | 2 | | 2020006 | C++ | 90.0 | 3 | | 2020001 | C++ | 85.0 | 4 | | 2020012 | C++ | 85.0 | 5 | | 2020003 | C++ | 81.0 | 6 | | 2020010 | C++ | 76.0 | 7 | | 2020002 | C++ | 70.0 | 8 | | 2020008 | C++ | 69.0 | 9 | | 2020007 | C++ | 66.0 | 10 | | 2020009 | C++ | 66.0 | 11 | | 2020004 | C++ | 60.0 | 12 | | 2020003 | English | 100.0 | 1 | | 2020004 | English | 100.0 | 2 | | 2020002 | English | 99.0 | 3 | | 2020013 | English | 88.0 | 4 | | 2020008 | English | 86.0 | 5 | | 2020009 | English | 86.0 | 6 | | 2020011 | English | 84.0 | 7 | | 2020010 | English | 81.0 | 8 | | 2020003 | English | 80.0 | 9 | | 2020007 | English | 76.0 | 10 | | 2020012 | English | 75.0 | 11 | | 2020005 | English | 70.0 | 12 | | 2020006 | English | 70.0 | 13 | | 2020005 | mysql | 98.0 | 1 | | 2020001 | mysql | 90.0 | 2 | | 2020008 | mysql | 90.0 | 3 | | 2020011 | mysql | 90.0 | 4 | | 2020004 | mysql | 80.0 | 5 | | 2020003 | mysql | 78.0 | 6 | | 2020010 | mysql | 75.0 | 7 | | 2020009 | mysql | 70.0 | 8 | | 2020006 | mysql | 60.0 | 9 | | 2020002 | mysql | 50.0 | 10 | | 2020007 | mysql | 50.0 | 11 | +---------+---------+-------+----+
DENSE_RANK
為了讓分數相同時排名也相同,則可以使用DENSE_RANK函式,結果如下:
mysql> select stu_no,course,score,DENSE_RANK() over(partition by course order by score desc) rn from tb_score; +---------+---------+-------+----+ | stu_no | course | score | rn | +---------+---------+-------+----+ | 2020005 | C++ | 96.0 | 1 | | 2020013 | C++ | 96.0 | 1 | | 2020006 | C++ | 90.0 | 2 | | 2020001 | C++ | 85.0 | 3 | | 2020012 | C++ | 85.0 | 3 | | 2020003 | C++ | 81.0 | 4 | | 2020010 | C++ | 76.0 | 5 | | 2020002 | C++ | 70.0 | 6 | | 2020008 | C++ | 69.0 | 7 | | 2020007 | C++ | 66.0 | 8 | | 2020009 | C++ | 66.0 | 8 | | 2020004 | C++ | 60.0 | 9 | | 2020003 | English | 100.0 | 1 | | 2020004 | English | 100.0 | 1 | | 2020002 | English | 99.0 | 2 | | 2020013 | English | 88.0 | 3 | | 2020008 | English | 86.0 | 4 | | 2020009 | English | 86.0 | 4 | | 2020011 | English | 84.0 | 5 | | 2020010 | English | 81.0 | 6 | | 2020003 | English | 80.0 | 7 | | 2020007 | English | 76.0 | 8 | | 2020012 | English | 75.0 | 9 | | 2020005 | English | 70.0 | 10 | | 2020006 | English | 70.0 | 10 | | 2020005 | mysql | 98.0 | 1 | | 2020001 | mysql | 90.0 | 2 | | 2020008 | mysql | 90.0 | 2 | | 2020011 | mysql | 90.0 | 2 | | 2020004 | mysql | 80.0 | 3 | | 2020003 | mysql | 78.0 | 4 | | 2020010 | mysql | 75.0 | 5 | | 2020009 | mysql | 70.0 | 6 | | 2020006 | mysql | 60.0 | 7 | | 2020002 | mysql | 50.0 | 8 | | 2020007 | mysql | 50.0 | 8 | +---------+---------+-------+----+
RANK
DENSE_RANK的結果是分數相同時排名相同了,但是下一個名次是緊接著上一個名次的,如果2個並列的第1之後,下一個我想是第3名,則可以使用RANK函式實現
mysql> select stu_no,course,score,rank() over(partition by course order by score desc) rn from tb_score; +---------+---------+-------+----+ | stu_no | course | score | rn | +---------+---------+-------+----+ | 2020005 | C++ | 96.0 | 1 | | 2020013 | C++ | 96.0 | 1 | | 2020006 | C++ | 90.0 | 3 | | 2020001 | C++ | 85.0 | 4 | | 2020012 | C++ | 85.0 | 4 | | 2020003 | C++ | 81.0 | 6 | | 2020010 | C++ | 76.0 | 7 | | 2020002 | C++ | 70.0 | 8 | | 2020008 | C++ | 69.0 | 9 | | 2020007 | C++ | 66.0 | 10 | | 2020009 | C++ | 66.0 | 10 | | 2020004 | C++ | 60.0 | 12 | | 2020003 | English | 100.0 | 1 | | 2020004 | English | 100.0 | 1 | | 2020002 | English | 99.0 | 3 | | 2020013 | English | 88.0 | 4 | | 2020008 | English | 86.0 | 5 | | 2020009 | English | 86.0 | 5 | | 2020011 | English | 84.0 | 7 | | 2020010 | English | 81.0 | 8 | | 2020003 | English | 80.0 | 9 | | 2020007 | English | 76.0 | 10 | | 2020012 | English | 75.0 | 11 | | 2020005 | English | 70.0 | 12 | | 2020006 | English | 70.0 | 12 | | 2020005 | mysql | 98.0 | 1 | | 2020001 | mysql | 90.0 | 2 | | 2020008 | mysql | 90.0 | 2 | | 2020011 | mysql | 90.0 | 2 | | 2020004 | mysql | 80.0 | 5 | | 2020003 | mysql | 78.0 | 6 | | 2020010 | mysql | 75.0 | 7 | | 2020009 | mysql | 70.0 | 8 | | 2020006 | mysql | 60.0 | 9 | | 2020002 | mysql | 50.0 | 10 | | 2020007 | mysql | 50.0 | 10 | +---------+---------+-------+----+
這樣就實現了各種排序需求。
NTILE
NTILE函式的作用是對每個分組排名後,再將對應分組分成N個小組,例如
mysql> select stu_no,course,score,rank() over(partition by course order by score desc) rn,NTILE(2) over(partition by course order by score desc) rn_group from tb_score; +---------+---------+-------+----+----------+ | stu_no | course | score | rn | rn_group | +---------+---------+-------+----+----------+ | 2020005 | C++ | 96.0 | 1 | 1 | | 2020013 | C++ | 96.0 | 1 | 1 | | 2020006 | C++ | 90.0 | 3 | 1 | | 2020001 | C++ | 85.0 | 4 | 1 | | 2020012 | C++ | 85.0 | 4 | 1 | | 2020003 | C++ | 81.0 | 6 | 1 | | 2020010 | C++ | 76.0 | 7 | 2 | | 2020002 | C++ | 70.0 | 8 | 2 | | 2020008 | C++ | 69.0 | 9 | 2 | | 2020007 | C++ | 66.0 | 10 | 2 | | 2020009 | C++ | 66.0 | 10 | 2 | | 2020004 | C++ | 60.0 | 12 | 2 | | 2020003 | English | 100.0 | 1 | 1 | | 2020004 | English | 100.0 | 1 | 1 | | 2020002 | English | 99.0 | 3 | 1 | | 2020013 | English | 88.0 | 4 | 1 | | 2020008 | English | 86.0 | 5 | 1 | | 2020009 | English | 86.0 | 5 | 1 | | 2020011 | English | 84.0 | 7 | 1 | | 2020010 | English | 81.0 | 8 | 2 | | 2020003 | English | 80.0 | 9 | 2 | | 2020007 | English | 76.0 | 10 | 2 | | 2020012 | English | 75.0 | 11 | 2 | | 2020005 | English | 70.0 | 12 | 2 | | 2020006 | English | 70.0 | 12 | 2 | | 2020005 | mysql | 98.0 | 1 | 1 | | 2020001 | mysql | 90.0 | 2 | 1 | | 2020008 | mysql | 90.0 | 2 | 1 | | 2020011 | mysql | 90.0 | 2 | 1 | | 2020004 | mysql | 80.0 | 5 | 1 | | 2020003 | mysql | 78.0 | 6 | 1 | | 2020010 | mysql | 75.0 | 7 | 2 | | 2020009 | mysql | 70.0 | 8 | 2 | | 2020006 | mysql | 60.0 | 9 | 2 | | 2020002 | mysql | 50.0 | 10 | 2 | | 2020007 | mysql | 50.0 | 10 | 2 | +---------+---------+-------+----+----------+
-- 視窗函式 -- row_number,rank,dense_rank之間的區別 with test(study_name,class_name,score) as( select 'sqlercn','mysql',95 union all select 'tom','mysql',99 union all select 'jerry','mysql',99 union all select 'gavin','mysql',98 union all select 'sqlercn','postgresql',99 union all select 'tom','postgresql',99 union all select 'jerry','postgresql',98 ) select study_name,class_name,score ,row_number() over(partition by class_name order by score desc) as rw ,rank() over(partition by class_name order by score desc) as rk ,dense_rank() over(partition by class_name order by score desc) as drk from test order by class_name,rw; 排名顯示的方式不同
-- 按學習人數對課程進行排名,並列出每類課程學習人數排名前3的課程名稱,學習人數以及名次 with tmp as( select class_name,title,score ,rank() over(partition by class_name order by score desc) as cnt from imc_course a join imc_class b on a.class_id = b.class_id ) select * from tmp where cnt<=3; -- 每門課程的學習人數佔奔雷課程總學習人數的百分比 with tmp as( select class_name,title,study_cnt ,sum(study_cnt) over(partition by class_name) as class_total from imc_course a join imc_class b on b.class_id = a.class_id ) select class_name,title,concat(study_cnt/class_total*100,'%') from tmp order by class_name;
-- 學習人數等於1000人的課程有哪些,列出他們的課程標題和學習人數 select title,study_cnt from imc_course where study_cnt = 1000; -- 學習人數大於1000人的課程有哪些,列出他們的課程標題和學習人數 select title,study_cnt from imc_course where study_cnt > 1000; 開發sql容易發生的問題
-- 查詢出分類ID為5的課程名稱和分類名稱 錯誤一: 在on中使用and進行過濾 select a.title,b.class_name from imc_course a join imc_class b on a.class_id = b.class_id and a.class_id=5;
把內連線變更為左外連線,起不到過濾出我們需要的資料的效果 select a.title,b.class_name from imc_course a left join imc_class b on a.class_id = b.class_id and a.class_id=5
使用where條件語句就沒有這樣的問題 select a.title,b.class_name from imc_course a left join imc_class b on a.class_id = b.class_id where b.class_id=5;
select * from imc_course where title in (select title from imc_class);
如何避免
這樣就實現了各種排序需求。
NTILE
NTILE函式的作用是對每個分組排名後,再將對應分組分成N個小組,例如