mysql GROUP BY 代替DISTINCT 遇到的問題及解決
阿新 • • 發佈:2019-01-08
近日在做一個數據分析時,發現DISTINCT比較慢,想通過group by來替代。然而替代並非一帆風順,在替代過程中,發現對於重複資料,group by會取第一次出現的記錄。為得到我想要的統計資料,折騰了一番。下面用例項來描述我的問題及調整過程。
場景:有一張表,記錄手機使用者的常規資訊,每天每個使用者一條記錄
CREATE TABLE `userinfo_test` (
`day` date NOT NULL DEFAULT '2016-06-01',
`username` varchar(64) NOT NULL,
`phone` varchar(16) NOT NULL DEFAULT '',
`cv` varchar(16) NOT NULL DEFAULT '',
PRIMARY KEY (`day`,`username`),
KEY `ix_day_username_phone_cv` (`day`,`username`,`phone`,`cv`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
現在需要統計每個月,每個機型每個版本的活躍人數(對於同一使用者,如果升級了版本,只統計升級後的版本)
INSERT INTO `userinfo_test` VALUES('2016-05-01','10000001','A', '1001');
INSERT INTO `userinfo_test` VALUES('2016-05-01','10000002','A', '1002');
INSERT INTO `userinfo_test` VALUES('2016-05-01','10000003','A', '1003');
INSERT INTO `userinfo_test` VALUES('2016-05-01','10000004','B', '1001');
INSERT INTO `userinfo_test` VALUES('2016-05-01','10000005','B', '1002');
INSERT INTO `userinfo_test` VALUES('2016-05-01','10000006','B', '1003');
INSERT INTO `userinfo_test` VALUES('2016-05-01','10000007','C', '1001');
INSERT INTO `userinfo_test` VALUES('2016-05-01','10000008','C', '1002');
INSERT INTO `userinfo_test` VALUES('2016-05-01','10000009','C', '1003');
INSERT INTO `userinfo_test` VALUES('2016-05-02','10000001','A', '1001');
INSERT INTO `userinfo_test` VALUES('2016-05-02','10000002','A', '1003');
INSERT INTO `userinfo_test` VALUES('2016-05-02','10000003','A', '1003');
INSERT INTO `userinfo_test` VALUES('2016-05-02','10000004','B', '1002');
INSERT INTO `userinfo_test` VALUES('2016-05-02','10000005','B', '1002');
INSERT INTO `userinfo_test` VALUES('2016-05-02','10000006','B', '1003');
INSERT INTO `userinfo_test` VALUES('2016-05-02','10000007','C', '1003');
INSERT INTO `userinfo_test` VALUES('2016-05-02','10000008','C', '1002');
INSERT INTO `userinfo_test` VALUES('2016-05-02','10000009','C', '1003');
記錄每天的使用者常規資訊,若當天資訊有變化,只記錄最後的資訊,每個使用者每天只有一條記錄資訊
mysql> SELECT * FROM `userinfo_test`;
+------------+----------+-------+------+
| day | username | phone | cv |
+------------+----------+-------+------+
| 2016-05-01 | 10000001 | A | 1001 |
| 2016-05-01 | 10000002 | A | 1002 |
| 2016-05-01 | 10000003 | A | 1003 |
| 2016-05-01 | 10000004 | B | 1001 |
| 2016-05-01 | 10000005 | B | 1002 |
| 2016-05-01 | 10000006 | B | 1003 |
| 2016-05-01 | 10000007 | C | 1001 |
| 2016-05-01 | 10000008 | C | 1002 |
| 2016-05-01 | 10000009 | C | 1003 |
| 2016-05-02 | 10000001 | A | 1001 |
| 2016-05-02 | 10000002 | A | 1003 |
| 2016-05-02 | 10000003 | A | 1003 |
| 2016-05-02 | 10000004 | B | 1002 |
| 2016-05-02 | 10000005 | B | 1002 |
| 2016-05-02 | 10000006 | B | 1003 |
| 2016-05-02 | 10000007 | C | 1003 |
| 2016-05-02 | 10000008 | C | 1002 |
| 2016-05-02 | 10000009 | C | 1003 |
| 2016-05-02 | 10000019 | C | 1003 |
+------------+----------+-------+------+
19 rows in set (0.00 sec)
彙總每天各機型各版本的人數
SELECT IFNULL(phone,'WITH_ROLLUP_TOTAL') p, IFNULL(cv,'WITH_ROLLUP_TOTAL') v, COUNT(cv)daynum FROM `userinfo_test` WHERE DAY = '2016-05-01' GROUP BY phone,cv WITH ROLLUP;
mysql> SELECT IFNULL(phone,'WITH_ROLLUP_TOTAL') p, IFNULL(cv,'WITH_ROLLUP_TOTAL') v, COUNT(cv)daynum FROM `userinfo_test` WHERE DAY = '2016-05-01' GROUP BY phone,cv WITH ROLLUP;
+-------------------+-------------------+--------+
| p | v | daynum |
+-------------------+-------------------+--------+
| A | 1001 | 1 |
| A | 1002 | 1 |
| A | 1003 | 1 |
| A | WITH_ROLLUP_TOTAL | 3 |
| B | 1001 | 1 |
| B | 1002 | 1 |
| B | 1003 | 1 |
| B | WITH_ROLLUP_TOTAL | 3 |
| C | 1001 | 1 |
| C | 1002 | 1 |
| C | 1003 | 1 |
| C | WITH_ROLLUP_TOTAL | 3 |
| WITH_ROLLUP_TOTAL | WITH_ROLLUP_TOTAL | 9 |
+-------------------+-------------------+--------+
13 rows in set (0.00 sec)
mysql> SELECT IFNULL(phone,'WITH_ROLLUP_TOTAL') p, IFNULL(cv,'WITH_ROLLUP_TOTAL') v, COUNT(cv)daynum FROM `userinfo_test` WHERE DAY = '2016-05-02' GROUP BY phone,cv WITH ROLLUP;
+-------------------+-------------------+--------+
| p | v | daynum |
+-------------------+-------------------+--------+
| A | 1001 | 1 |
| A | 1003 | 2 |
| A | WITH_ROLLUP_TOTAL | 3 |
| B | 1002 | 2 |
| B | 1003 | 1 |
| B | WITH_ROLLUP_TOTAL | 3 |
| C | 1002 | 1 |
| C | 1003 | 3 |
| C | WITH_ROLLUP_TOTAL | 4 |
| WITH_ROLLUP_TOTAL | WITH_ROLLUP_TOTAL | 10 |
+-------------------+-------------------+--------+
10 rows in set (0.00 sec)
彙總每月各機型各版本的人數,若該月同一人有多個機型版本資訊,則以最後記錄的機型版本資訊為準,只彙總這一條記錄
SELECT IFNULL(phone,'WITH_ROLLUP_TOTAL') p, IFNULL(cv,'WITH_ROLLUP_TOTAL') v,COUNT(cv) monthnum FROM (SELECT phone,cv FROM (SELECT * FROM (SELECT cv,phone,username FROM `userinfo_test` GROUP BY cv,phone,username) c ORDER BY cv DESC) d GROUP BY username) e GROUP BY phone,cv WITH ROLLUP;
+-------------------+-------------------+----------+
| p | v | monthnum |
+-------------------+-------------------+----------+
| A | 1001 | 1 |
| A | 1003 | 2 |
| A | WITH_ROLLUP_TOTAL | 3 |
| B | 1002 | 2 |
| B | 1003 | 1 |
| B | WITH_ROLLUP_TOTAL | 3 |
| C | 1002 | 1 |
| C | 1003 | 3 |
| C | WITH_ROLLUP_TOTAL | 4 |
| WITH_ROLLUP_TOTAL | WITH_ROLLUP_TOTAL | 10 |
+-------------------+-------------------+----------+
10 rows in set (0.00 sec)
注意,上述sql 對cv 做了降序排列(考慮到通常都是升級了版本,即認為版本越大,代表了同一使用者的最後的記錄),group by 預設統計第一次出現的記錄
如果不對cv做降序
SELECT IFNULL(phone,'WITH_ROLLUP_TOTAL') p, IFNULL(cv,'WITH_ROLLUP_TOTAL') v,COUNT(cv) monthnum FROM (SELECT phone,cv,username FROM `userinfo_test` GROUP BY username) c GROUP BY phone,cv WITH ROLLUP;
+-------------------+-------------------+----------+
| p | v | monthnum |
+-------------------+-------------------+----------+
| A | 1001 | 1 |
| A | 1002 | 1 |
| A | 1003 | 1 |
| A | WITH_ROLLUP_TOTAL | 3 |
| B | 1001 | 1 |
| B | 1002 | 1 |
| B | 1003 | 1 |
| B | WITH_ROLLUP_TOTAL | 3 |
| C | 1001 | 1 |
| C | 1002 | 1 |
| C | 1003 | 2 |
| C | WITH_ROLLUP_TOTAL | 4 |
| WITH_ROLLUP_TOTAL | WITH_ROLLUP_TOTAL | 10 |
+-------------------+-------------------+----------+
13 rows in set (0.00 sec)
可以看到,對於同一使用者,統計到的是第一次出現的記錄。
如果用DISTINCT,下面的sql並不能得到我們想要的資料。
此外對於重複資料比較多時,group by要比DISTINCT快很多
SELECT IFNULL(phone,'WITH_ROLLUP_TOTAL') p, IFNULL(cv,'WITH_ROLLUP_TOTAL') v, COUNT(DISTINCT username)daynum FROM `userinfo_test` GROUP BY phone,cv WITH ROLLUP;
+-------------------+-------------------+--------+
| p | v | daynum |
+-------------------+-------------------+--------+
| A | 1001 | 1 |
| A | 1002 | 1 |
| A | 1003 | 2 |
| A | WITH_ROLLUP_TOTAL | 3 |
| B | 1001 | 1 |
| B | 1002 | 2 |
| B | 1003 | 1 |
| B | WITH_ROLLUP_TOTAL | 3 |
| C | 1001 | 1 |
| C | 1002 | 1 |
| C | 1003 | 3 |
| C | WITH_ROLLUP_TOTAL | 4 |
| WITH_ROLLUP_TOTAL | WITH_ROLLUP_TOTAL | 10 |
+-------------------+-------------------+--------+
13 rows in set (0.01 sec)
改用如下sql可以獲得想要的結果
SELECT IFNULL(phone,'WITH_ROLLUP_TOTAL') p, IFNULL(cv,'WITH_ROLLUP_TOTAL') v, COUNT(cv)daynum FROM(SELECT * FROM (SELECT DISTINCT username,phone,cv FROM `userinfo_test` ORDER BY cv DESC) c GROUP BY username)d GROUP BY phone,cv WITH ROLLUP;
+-------------------+-------------------+--------+
| p | v | daynum |
+-------------------+-------------------+--------+
| A | 1001 | 1 |
| A | 1003 | 2 |
| A | WITH_ROLLUP_TOTAL | 3 |
| B | 1002 | 2 |
| B | 1003 | 1 |
| B | WITH_ROLLUP_TOTAL | 3 |
| C | 1002 | 1 |
| C | 1003 | 3 |
| C | WITH_ROLLUP_TOTAL | 4 |
| WITH_ROLLUP_TOTAL | WITH_ROLLUP_TOTAL | 10 |
+-------------------+-------------------+--------+
10 rows in set (0.00 sec)
group by 支援排序,group by的排序是按最後一個欄位排序的
SELECT phone,cv,COUNT(cv)num FROM `userinfo_test` GROUP BY phone,cv WITH ROLLUP;
+-------+------+-----+
| phone | cv | num |
+-------+------+-----+
| A | 1001 | 2 |
| A | 1002 | 1 |
| A | 1003 | 3 |
| A | NULL | 6 |
| B | 1001 | 1 |
| B | 1002 | 3 |
| B | 1003 | 2 |
| B | NULL | 6 |
| C | 1001 | 1 |
| C | 1002 | 2 |
| C | 1003 | 4 |
| C | NULL | 7 |
| NULL | NULL | 19 |
+-------+------+-----+
13 rows in set (0.00 sec)
SELECT phone,cv,COUNT(cv)num FROM `userinfo_test` GROUP BY phone,cv DESC WITH ROLLUP;
+-------+------+-----+
| phone | cv | num |
+-------+------+-----+
| A | 1003 | 3 |
| A | 1002 | 1 |
| A | 1001 | 2 |
| A | NULL | 6 |
| B | 1003 | 2 |
| B | 1002 | 3 |
| B | 1001 | 1 |
| B | NULL | 6 |
| C | 1003 | 4 |
| C | 1002 | 2 |
| C | 1001 | 1 |
| C | NULL | 7 |
| NULL | NULL | 19 |
+-------+------+-----+
13 rows in set (0.01 sec)
explain SELECT cv,phone,COUNT(phone)num FROM `userinfo_test` GROUP BY cv,phone DESC WITH ROLLUP;
+------+-------+-----+
| cv | phone | num |
+------+-------+-----+
| 1001 | C | 1 |
| 1001 | B | 1 |
| 1001 | A | 2 |
| 1001 | NULL | 4 |
| 1002 | C | 2 |
| 1002 | B | 3 |
| 1002 | A | 1 |
| 1002 | NULL | 6 |
| 1003 | C | 4 |
| 1003 | B | 2 |
| 1003 | A | 3 |
| 1003 | NULL | 9 |
| NULL | NULL | 19 |
+------+-------+-----+
13 rows in set (0.00 sec)
如果用到group by 的排序,前面月活躍使用者數的統計語句
SELECT IFNULL(phone,'WITH_ROLLUP_TOTAL') p, IFNULL(cv,'WITH_ROLLUP_TOTAL') v,COUNT(cv) monthnum FROM (SELECT phone,cv FROM (SELECT * FROM (SELECT cv,phone,username FROM `userinfo_test` GROUP BY cv,phone,username) c ORDER BY cv DESC) d GROUP BY username) e GROUP BY phone,cv WITH ROLLUP;
可以修改為
SELECT IFNULL(phone,'WITH_ROLLUP_TOTAL') p, IFNULL(cv,'WITH_ROLLUP_TOTAL') v,COUNT(cv) monthnum FROM (SELECT phone,cv FROM (SELECT cv,phone,username FROM `userinfo_test` GROUP BY username,phone,cv DESC) c GROUP BY username) d GROUP BY phone,cv WITH ROLLUP;
+-------------------+-------------------+----------+
| p | v | monthnum |
+-------------------+-------------------+----------+
| A | 1001 | 1 |
| A | 1003 | 2 |
| A | WITH_ROLLUP_TOTAL | 3 |
| B | 1002 | 2 |
| B | 1003 | 1 |
| B | WITH_ROLLUP_TOTAL | 3 |
| C | 1002 | 1 |
| C | 1003 | 3 |
| C | WITH_ROLLUP_TOTAL | 4 |
| WITH_ROLLUP_TOTAL | WITH_ROLLUP_TOTAL | 10 |
+-------------------+-------------------+----------+
10 rows in set (0.01 sec)
後記:
1.由於天與天之間重複資料較多,因此group by 比DISTINCT 相對來說有優勢
2.感覺sql還是寫的比較複雜,歡迎指點,優化sql或者有更好的方法
場景:有一張表,記錄手機使用者的常規資訊,每天每個使用者一條記錄
CREATE TABLE `userinfo_test` (
`day` date NOT NULL DEFAULT '2016-06-01',
`username` varchar(64) NOT NULL,
`phone` varchar(16) NOT NULL DEFAULT '',
`cv` varchar(16) NOT NULL DEFAULT '',
PRIMARY KEY (`day`,`username`),
KEY `ix_day_username_phone_cv` (`day`,`username`,`phone`,`cv`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
現在需要統計每個月,每個機型每個版本的活躍人數(對於同一使用者,如果升級了版本,只統計升級後的版本)
INSERT INTO `userinfo_test` VALUES('2016-05-01','10000001','A', '1001');
INSERT INTO `userinfo_test` VALUES('2016-05-01','10000002','A', '1002');
INSERT INTO `userinfo_test` VALUES('2016-05-01','10000003','A', '1003');
INSERT INTO `userinfo_test` VALUES('2016-05-01','10000004','B', '1001');
INSERT INTO `userinfo_test` VALUES('2016-05-01','10000005','B', '1002');
INSERT INTO `userinfo_test` VALUES('2016-05-01','10000006','B', '1003');
INSERT INTO `userinfo_test` VALUES('2016-05-01','10000007','C', '1001');
INSERT INTO `userinfo_test` VALUES('2016-05-01','10000008','C', '1002');
INSERT INTO `userinfo_test` VALUES('2016-05-01','10000009','C', '1003');
INSERT INTO `userinfo_test` VALUES('2016-05-02','10000001','A', '1001');
INSERT INTO `userinfo_test` VALUES('2016-05-02','10000002','A', '1003');
INSERT INTO `userinfo_test` VALUES('2016-05-02','10000003','A', '1003');
INSERT INTO `userinfo_test` VALUES('2016-05-02','10000004','B', '1002');
INSERT INTO `userinfo_test` VALUES('2016-05-02','10000005','B', '1002');
INSERT INTO `userinfo_test` VALUES('2016-05-02','10000006','B', '1003');
INSERT INTO `userinfo_test` VALUES('2016-05-02','10000007','C', '1003');
INSERT INTO `userinfo_test` VALUES('2016-05-02','10000008','C', '1002');
INSERT INTO `userinfo_test` VALUES('2016-05-02','10000009','C', '1003');
記錄每天的使用者常規資訊,若當天資訊有變化,只記錄最後的資訊,每個使用者每天只有一條記錄資訊
mysql> SELECT * FROM `userinfo_test`;
+------------+----------+-------+------+
| day | username | phone | cv |
+------------+----------+-------+------+
| 2016-05-01 | 10000001 | A | 1001 |
| 2016-05-01 | 10000002 | A | 1002 |
| 2016-05-01 | 10000003 | A | 1003 |
| 2016-05-01 | 10000004 | B | 1001 |
| 2016-05-01 | 10000005 | B | 1002 |
| 2016-05-01 | 10000006 | B | 1003 |
| 2016-05-01 | 10000007 | C | 1001 |
| 2016-05-01 | 10000008 | C | 1002 |
| 2016-05-01 | 10000009 | C | 1003 |
| 2016-05-02 | 10000001 | A | 1001 |
| 2016-05-02 | 10000002 | A | 1003 |
| 2016-05-02 | 10000003 | A | 1003 |
| 2016-05-02 | 10000004 | B | 1002 |
| 2016-05-02 | 10000005 | B | 1002 |
| 2016-05-02 | 10000006 | B | 1003 |
| 2016-05-02 | 10000007 | C | 1003 |
| 2016-05-02 | 10000008 | C | 1002 |
| 2016-05-02 | 10000009 | C | 1003 |
| 2016-05-02 | 10000019 | C | 1003 |
+------------+----------+-------+------+
19 rows in set (0.00 sec)
彙總每天各機型各版本的人數
SELECT IFNULL(phone,'WITH_ROLLUP_TOTAL') p, IFNULL(cv,'WITH_ROLLUP_TOTAL') v, COUNT(cv)daynum FROM `userinfo_test` WHERE DAY = '2016-05-01' GROUP BY phone,cv WITH ROLLUP;
mysql> SELECT IFNULL(phone,'WITH_ROLLUP_TOTAL') p, IFNULL(cv,'WITH_ROLLUP_TOTAL') v, COUNT(cv)daynum FROM `userinfo_test` WHERE DAY = '2016-05-01' GROUP BY phone,cv WITH ROLLUP;
+-------------------+-------------------+--------+
| p | v | daynum |
+-------------------+-------------------+--------+
| A | 1001 | 1 |
| A | 1002 | 1 |
| A | 1003 | 1 |
| A | WITH_ROLLUP_TOTAL | 3 |
| B | 1001 | 1 |
| B | 1002 | 1 |
| B | 1003 | 1 |
| B | WITH_ROLLUP_TOTAL | 3 |
| C | 1001 | 1 |
| C | 1002 | 1 |
| C | 1003 | 1 |
| C | WITH_ROLLUP_TOTAL | 3 |
| WITH_ROLLUP_TOTAL | WITH_ROLLUP_TOTAL | 9 |
+-------------------+-------------------+--------+
13 rows in set (0.00 sec)
mysql> SELECT IFNULL(phone,'WITH_ROLLUP_TOTAL') p, IFNULL(cv,'WITH_ROLLUP_TOTAL') v, COUNT(cv)daynum FROM `userinfo_test` WHERE DAY = '2016-05-02' GROUP BY phone,cv WITH ROLLUP;
+-------------------+-------------------+--------+
| p | v | daynum |
+-------------------+-------------------+--------+
| A | 1001 | 1 |
| A | 1003 | 2 |
| A | WITH_ROLLUP_TOTAL | 3 |
| B | 1002 | 2 |
| B | 1003 | 1 |
| B | WITH_ROLLUP_TOTAL | 3 |
| C | 1002 | 1 |
| C | 1003 | 3 |
| C | WITH_ROLLUP_TOTAL | 4 |
| WITH_ROLLUP_TOTAL | WITH_ROLLUP_TOTAL | 10 |
+-------------------+-------------------+--------+
10 rows in set (0.00 sec)
彙總每月各機型各版本的人數,若該月同一人有多個機型版本資訊,則以最後記錄的機型版本資訊為準,只彙總這一條記錄
SELECT IFNULL(phone,'WITH_ROLLUP_TOTAL') p, IFNULL(cv,'WITH_ROLLUP_TOTAL') v,COUNT(cv) monthnum FROM (SELECT phone,cv FROM (SELECT * FROM (SELECT cv,phone,username FROM `userinfo_test` GROUP BY cv,phone,username) c ORDER BY cv DESC) d GROUP BY username) e GROUP BY phone,cv WITH ROLLUP;
+-------------------+-------------------+----------+
| p | v | monthnum |
+-------------------+-------------------+----------+
| A | 1001 | 1 |
| A | 1003 | 2 |
| A | WITH_ROLLUP_TOTAL | 3 |
| B | 1002 | 2 |
| B | 1003 | 1 |
| B | WITH_ROLLUP_TOTAL | 3 |
| C | 1002 | 1 |
| C | 1003 | 3 |
| C | WITH_ROLLUP_TOTAL | 4 |
| WITH_ROLLUP_TOTAL | WITH_ROLLUP_TOTAL | 10 |
+-------------------+-------------------+----------+
10 rows in set (0.00 sec)
注意,上述sql 對cv 做了降序排列(考慮到通常都是升級了版本,即認為版本越大,代表了同一使用者的最後的記錄),group by 預設統計第一次出現的記錄
如果不對cv做降序
SELECT IFNULL(phone,'WITH_ROLLUP_TOTAL') p, IFNULL(cv,'WITH_ROLLUP_TOTAL') v,COUNT(cv) monthnum FROM (SELECT phone,cv,username FROM `userinfo_test` GROUP BY username) c GROUP BY phone,cv WITH ROLLUP;
+-------------------+-------------------+----------+
| p | v | monthnum |
+-------------------+-------------------+----------+
| A | 1001 | 1 |
| A | 1002 | 1 |
| A | 1003 | 1 |
| A | WITH_ROLLUP_TOTAL | 3 |
| B | 1001 | 1 |
| B | 1002 | 1 |
| B | 1003 | 1 |
| B | WITH_ROLLUP_TOTAL | 3 |
| C | 1001 | 1 |
| C | 1002 | 1 |
| C | 1003 | 2 |
| C | WITH_ROLLUP_TOTAL | 4 |
| WITH_ROLLUP_TOTAL | WITH_ROLLUP_TOTAL | 10 |
+-------------------+-------------------+----------+
13 rows in set (0.00 sec)
可以看到,對於同一使用者,統計到的是第一次出現的記錄。
如果用DISTINCT,下面的sql並不能得到我們想要的資料。
此外對於重複資料比較多時,group by要比DISTINCT快很多
SELECT IFNULL(phone,'WITH_ROLLUP_TOTAL') p, IFNULL(cv,'WITH_ROLLUP_TOTAL') v, COUNT(DISTINCT username)daynum FROM `userinfo_test` GROUP BY phone,cv WITH ROLLUP;
+-------------------+-------------------+--------+
| p | v | daynum |
+-------------------+-------------------+--------+
| A | 1001 | 1 |
| A | 1002 | 1 |
| A | 1003 | 2 |
| A | WITH_ROLLUP_TOTAL | 3 |
| B | 1001 | 1 |
| B | 1002 | 2 |
| B | 1003 | 1 |
| B | WITH_ROLLUP_TOTAL | 3 |
| C | 1001 | 1 |
| C | 1002 | 1 |
| C | 1003 | 3 |
| C | WITH_ROLLUP_TOTAL | 4 |
| WITH_ROLLUP_TOTAL | WITH_ROLLUP_TOTAL | 10 |
+-------------------+-------------------+--------+
13 rows in set (0.01 sec)
改用如下sql可以獲得想要的結果
SELECT IFNULL(phone,'WITH_ROLLUP_TOTAL') p, IFNULL(cv,'WITH_ROLLUP_TOTAL') v, COUNT(cv)daynum FROM(SELECT * FROM (SELECT DISTINCT username,phone,cv FROM `userinfo_test` ORDER BY cv DESC) c GROUP BY username)d GROUP BY phone,cv WITH ROLLUP;
+-------------------+-------------------+--------+
| p | v | daynum |
+-------------------+-------------------+--------+
| A | 1001 | 1 |
| A | 1003 | 2 |
| A | WITH_ROLLUP_TOTAL | 3 |
| B | 1002 | 2 |
| B | 1003 | 1 |
| B | WITH_ROLLUP_TOTAL | 3 |
| C | 1002 | 1 |
| C | 1003 | 3 |
| C | WITH_ROLLUP_TOTAL | 4 |
| WITH_ROLLUP_TOTAL | WITH_ROLLUP_TOTAL | 10 |
+-------------------+-------------------+--------+
10 rows in set (0.00 sec)
group by 支援排序,group by的排序是按最後一個欄位排序的
SELECT phone,cv,COUNT(cv)num FROM `userinfo_test` GROUP BY phone,cv WITH ROLLUP;
+-------+------+-----+
| phone | cv | num |
+-------+------+-----+
| A | 1001 | 2 |
| A | 1002 | 1 |
| A | 1003 | 3 |
| A | NULL | 6 |
| B | 1001 | 1 |
| B | 1002 | 3 |
| B | 1003 | 2 |
| B | NULL | 6 |
| C | 1001 | 1 |
| C | 1002 | 2 |
| C | 1003 | 4 |
| C | NULL | 7 |
| NULL | NULL | 19 |
+-------+------+-----+
13 rows in set (0.00 sec)
SELECT phone,cv,COUNT(cv)num FROM `userinfo_test` GROUP BY phone,cv DESC WITH ROLLUP;
+-------+------+-----+
| phone | cv | num |
+-------+------+-----+
| A | 1003 | 3 |
| A | 1002 | 1 |
| A | 1001 | 2 |
| A | NULL | 6 |
| B | 1003 | 2 |
| B | 1002 | 3 |
| B | 1001 | 1 |
| B | NULL | 6 |
| C | 1003 | 4 |
| C | 1002 | 2 |
| C | 1001 | 1 |
| C | NULL | 7 |
| NULL | NULL | 19 |
+-------+------+-----+
13 rows in set (0.01 sec)
explain SELECT cv,phone,COUNT(phone)num FROM `userinfo_test` GROUP BY cv,phone DESC WITH ROLLUP;
+------+-------+-----+
| cv | phone | num |
+------+-------+-----+
| 1001 | C | 1 |
| 1001 | B | 1 |
| 1001 | A | 2 |
| 1001 | NULL | 4 |
| 1002 | C | 2 |
| 1002 | B | 3 |
| 1002 | A | 1 |
| 1002 | NULL | 6 |
| 1003 | C | 4 |
| 1003 | B | 2 |
| 1003 | A | 3 |
| 1003 | NULL | 9 |
| NULL | NULL | 19 |
+------+-------+-----+
13 rows in set (0.00 sec)
如果用到group by 的排序,前面月活躍使用者數的統計語句
SELECT IFNULL(phone,'WITH_ROLLUP_TOTAL') p, IFNULL(cv,'WITH_ROLLUP_TOTAL') v,COUNT(cv) monthnum FROM (SELECT phone,cv FROM (SELECT * FROM (SELECT cv,phone,username FROM `userinfo_test` GROUP BY cv,phone,username) c ORDER BY cv DESC) d GROUP BY username) e GROUP BY phone,cv WITH ROLLUP;
可以修改為
SELECT IFNULL(phone,'WITH_ROLLUP_TOTAL') p, IFNULL(cv,'WITH_ROLLUP_TOTAL') v,COUNT(cv) monthnum FROM (SELECT phone,cv FROM (SELECT cv,phone,username FROM `userinfo_test` GROUP BY username,phone,cv DESC) c GROUP BY username) d GROUP BY phone,cv WITH ROLLUP;
+-------------------+-------------------+----------+
| p | v | monthnum |
+-------------------+-------------------+----------+
| A | 1001 | 1 |
| A | 1003 | 2 |
| A | WITH_ROLLUP_TOTAL | 3 |
| B | 1002 | 2 |
| B | 1003 | 1 |
| B | WITH_ROLLUP_TOTAL | 3 |
| C | 1002 | 1 |
| C | 1003 | 3 |
| C | WITH_ROLLUP_TOTAL | 4 |
| WITH_ROLLUP_TOTAL | WITH_ROLLUP_TOTAL | 10 |
+-------------------+-------------------+----------+
10 rows in set (0.01 sec)
後記:
1.由於天與天之間重複資料較多,因此group by 比DISTINCT 相對來說有優勢
2.感覺sql還是寫的比較複雜,歡迎指點,優化sql或者有更好的方法