使用 MySQL 管理層次結構的數據
概述
我們知道,關系數據庫的表更適合扁平的列表,而不是像 XML 那樣可以直管的保存具有父子關系的層次結構數據。
首先定義一下我們討論的層次結構,是這樣的一組數據,每個條目只能有一個父條目,可以有零個或多個子條目(唯一的例外是根條目,它沒有父條目)。許多依賴數據庫的應用都會遇到層次結構的數據,例如論壇或郵件列表的線索、企業的組織結構圖、內容管理系統或商城的分類目錄等等。我們如下數據作為示例:
數據來源於維基百科的這個頁面,為什麽挑了這幾個條目,以及是否準確合理在這裏就不深究了。
Mike Hillyer 考慮了兩種不同的模型——鄰接表(Adjacency List)和嵌套集(Nested Set)來實現這個層次結構。
鄰接表(Adjacency List)模型
我們可以很直觀的使用下面的方式來保存如圖所示的結構。
創建名為 distributions
的表:
CREATE TABLE distributions (
id INT NOT NULL AUTO_INCREMENT,
name VARCHAR(32) NOT NULL,
parent INT NULL DEFAULT NULL,
PRIMARY KEY (id)
)
ENGINE = InnoDB
DEFAULT CHARACTER SET = utf8;
插入數據:
INSERT INTO distributions VALUES
(1, ‘Linux‘, NULL),
(2, ‘Debian‘, 1),
(3, ‘Knoppix‘, 2),
(4, ‘Ubuntu‘, 2),
(5, ‘Gentoo‘, 1),
(6, ‘Red Hat‘, 1),
(7, ‘Fedora Core‘, 6),
(8, ‘RHEL‘, 6),
(9, ‘CentOS‘, 8),
(10, ‘Oracle Linux‘, 8);
執行:
SELECT * FROM distributions;
可以看到表中的數據形如:
+----+--------------+--------+
| id | name | parent |
+----+--------------+--------+
| 1 | Linux | NULL |
| 2 | Debian | 1 |
| 3 | Knoppix | 2 |
| 4 | Ubuntu | 2 |
| 5 | Gentoo | 1 |
| 6 | Red Hat | 1 |
| 7 | Fedora Core | 6 |
| 8 | RHEL | 6 |
| 9 | CentOS | 8 |
| 10 | Oracle Linux | 8 |
+----+--------------+--------+
使用鏈接表模型,表中的每一條記錄都包含一個指向其上層記錄的指針。頂層記錄(這個例子中是 Linux
)的這個字段的值為 NULL
。鄰接表的優勢是相當直觀和簡單,我們一眼就能看出 CentOS
衍生自 RHEL
,後者又是從 Red Hat
發展而來的。雖然客戶端程序可能處理起來相當簡單,但是使用純 SQL 處理鄰接表則會遇到一些麻煩。
獲取整棵樹以及單個節點的完整路徑
第一個處理層次結構常見的任務是顯示整個層次結構,通常包含一定的縮進。使用純 SQL 處理時通常需要借助所謂的 self-join 技巧:
SELECT
t1.name AS level1,
t2.name as level2,
t3.name as level3,
t4.name as level4
FROM
distributions AS t1
LEFT JOIN distributions AS t2
ON t2.parent = t1.id
LEFT JOIN distributions AS t3
ON t3.parent = t2.id
LEFT JOIN distributions AS t4
ON t4.parent = t3.id
WHERE t1.name = ‘Linux‘;
結果如下:
+--------+---------+-------------+--------------+
| level1 | level2 | level3 | level4 |
+--------+---------+-------------+--------------+
| Linux | Red Hat | RHEL | CentOS |
| Linux | Red Hat | RHEL | Oracle Linux |
| Linux | Debian | Knoppix | NULL |
| Linux | Debian | Ubuntu | NULL |
| Linux | Red Hat | Fedora Core | NULL |
| Linux | Gentoo | NULL | NULL |
+--------+---------+-------------+--------------+
可以看到,實際上客戶端代碼拿到這個結果也不容易處理。對比原文,我們發現返回結果的順序也是不確定的。在實踐中沒有什麽參考意義。不過可以通過增加一個 WHERE
條件,獲取一個節點的完整路徑:
SELECT
t1.name AS level1,
t2.name as level2,
t3.name as level3,
t4.name as level4
FROM
distributions AS t1
LEFT JOIN distributions AS t2
ON t2.parent = t1.id
LEFT JOIN distributions AS t3
ON t3.parent = t2.id
LEFT JOIN distributions AS t4
ON t4.parent = t3.id
WHERE
t1.name = ‘Linux‘
AND t4.name = ‘CentOS‘;
結果如下:
+--------+---------+--------+--------+
| level1 | level2 | level3 | level4 |
+--------+---------+--------+--------+
| Linux | Red Hat | RHEL | CentOS |
+--------+---------+--------+--------+
找出所有的葉節點
使用 LEFT JOIN
我們可以找出所有的葉節點:
SELECT
distributions.id, distributions.name
FROM
distributions
LEFT JOIN distributions as child
ON distributions.id = child.parent
WHERE child.id IS NULL;
結果如下:
+----+--------------+
| id | name |
+----+--------------+
| 3 | Knoppix |
| 4 | Ubuntu |
| 5 | Gentoo |
| 7 | Fedora Core |
| 9 | CentOS |
| 10 | Oracle Linux |
+----+--------------+
鄰接表模型的限制
使用純 SQL 處理鄰接表模型即便在最好的情況下也是困難的。要獲得一個分類的完整路徑之前我們需要知道它的層次有多深。除此之外,當我們刪除一個節點時我們需要格外的謹慎,因為這可能潛在的在處理過程中整個子樹成為孤兒(例如刪除『便攜式小家電』則所有其子分類都成為孤兒了)。其中一些限制可以在客戶端代碼或存儲過程中定位並處理。例如在存儲過程中我們可以自下而上的遍歷這個結構以便返回整棵樹或一個路徑。我們也可以使用存儲過程來刪除節點,通過提升其一個子節點的層次並重新設置所有其它子節點的父節點為這個節點,來避免整棵子樹成為孤兒。
嵌套集(Nested Set)模型
由於使用純 SQL 處理鄰接表模型存在種種不便,因此 Mike Hillyer 鄭重的介紹了嵌套集(Nested Set)模型。當使用這種模型時,我們把層次結構的節點和路徑從腦海中抹去,把它們想象為一個個容器:
可以看到層次關系沒有改變,大的容器包含子容器。我們使用容器的左值和右值來建立數據表:
CREATE TABLE nested (
id INT NOT NULL AUTO_INCREMENT,
name VARCHAR(32) NOT NULL,
`left` INT NOT NULL,
`right` INT NOT NULL,
PRIMARY KEY (id)
)
ENGINE = InnoDB
DEFAULT CHARACTER SET = utf8;
需要註意 left
和 right
是 MySQL 的保留字,因此使用標識分隔符來標記它們。
插入數據:
INSERT INTO nested VALUES
(1, ‘Linux‘, 1, 20),
(2, ‘Debian‘, 2, 7),
(3, ‘Knoppix‘, 3, 4),
(4, ‘Ubuntu‘, 5, 6),
(5, ‘Gentoo‘, 8, 9),
(6, ‘Red Hat‘, 10, 19),
(7, ‘Fedora Core‘, 11, 12),
(8, ‘RHEL‘, 13, 18),
(9, ‘CentOS‘, 14, 15),
(10, ‘Oracle Linux‘, 16, 17);
查看內容:
SELECT * FROM nested ORDER BY id;
可以看到:
+----+--------------+------+-------+
| id | name | left | right |
+----+--------------+------+-------+
| 1 | Linux | 1 | 20 |
| 2 | Debian | 2 | 7 |
| 3 | Knoppix | 3 | 4 |
| 4 | Ubuntu | 5 | 6 |
| 5 | Gentoo | 8 | 9 |
| 6 | Red Hat | 10 | 19 |
| 7 | Fedora Core | 11 | 12 |
| 8 | RHEL | 13 | 18 |
| 9 | CentOS | 14 | 15 |
| 10 | Oracle Linux | 16 | 17 |
+----+--------------+------+-------+
我們是如何確定左編號和右編號的呢,通過下圖我們可以直觀的發現只要會數數即可完成:
回到樹形模型該怎麽處理,通過下圖,對數據結構稍有概念的人都會知道,稍加改動的先序遍歷算法即可完成這項編號的工作:
獲取整棵樹
一個節點的左編號總是介於其父節點的左右編號之間,利用這個特性使用 self-join 鏈接到父節點,可以獲取整棵樹:
SELECT node.name
FROM
nested AS node,
nested AS parent
WHERE
node.`left` BETWEEN parent.`left` AND parent.`right`
AND parent.name = ‘Linux‘
ORDER BY node.`left`;
結果如下:
+--------------+
| name |
+--------------+
| Linux |
| Debian |
| Knoppix |
| Ubuntu |
| Gentoo |
| Red Hat |
| Fedora Core |
| RHEL |
| CentOS |
| Oracle Linux |
+--------------+
但是這樣我們丟失了層次的信息。怎麽辦呢?使用 COUNT()
函數和 GROUP BY
子句,可以實現這個目的:
SELECT
node.name, (COUNT(parent.name) - 1) AS depth
FROM
nested AS node,
nested AS parent
WHERE
node.`left` BETWEEN parent.`left` AND parent.`right`
GROUP BY node.name
ORDER BY ANY_VALUE(node.`left`);
需要註意 MySQL 5.7.5 開始默認啟用了 only_full_group_by
模式,讓 GROUP BY
的行為與 SQL92 標準一致,因此直接使用 ORDER BY node.`left`
會產生錯誤:
ERROR 1055 (42000): Expression #1 of ORDER BY clause is not in GROUP BY clause and contains nonaggregated column‘test.node.left‘ which is not functionally dependent on columns in GROUP BY clause; this is incompatible withsql_mode=only_full_group_by
使用 ANY_VALUE()
是避免這個問題的一種的途徑。
結果如下:
+--------------+-------+
| name | depth |
+--------------+-------+
| Linux | 0 |
| Debian | 1 |
| Knoppix | 2 |
| Ubuntu | 2 |
| Gentoo | 1 |
| Red Hat | 1 |
| Fedora Core | 2 |
| RHEL | 2 |
| CentOS | 3 |
| Oracle Linux | 3 |
+--------------+-------+
稍作調整就可以直接顯示層次:
SELECT
CONCAT(REPEAT(‘ ‘, COUNT(parent.name) - 1), node.name) AS name
FROM
nested AS node,
nested AS parent
WHERE
node.`left` BETWEEN parent.`left` AND parent.`right`
GROUP BY node.name
ORDER BY ANY_VALUE(node.`left`);
結果相當漂亮:
+-----------------+
| name |
+-----------------+
| Linux |
| Debian |
| Knoppix |
| Ubuntu |
| Gentoo |
| Red Hat |
| Fedora Core |
| RHEL |
| CentOS |
| Oracle Linux |
+-----------------+
當然客戶端代碼可能會更傾向於使用 depth
值,對返回的結果集進行循環,Web 開發人員可以根據其增大或減小使用 <li>
/</li>
或 <ul>
/</ul>
等。
獲取節點在子樹中的深度
要獲取節點在子樹中的深度,我們需要第三個 self-join 以及子查詢來將結果限制在特定的子樹中以及進行必要的計算:
SELECT
node.name, (COUNT(parent.name) - ANY_VALUE(sub_tree.depth) - 1) AS depth
FROM
nested AS node,
nested AS parent,
nested AS sub_parent,
(
SELECT
node.name, (COUNT(parent.name) - 1) AS depth
FROM
nested AS node,
nested AS parent
WHERE
node.`left` BETWEEN parent.`left` AND parent.`right`
AND node.name = ‘Red Hat‘
GROUP BY node.name, node.`left`
ORDER BY node.`left`
) AS sub_tree
WHERE
node.`left` BETWEEN parent.`left` AND parent.`right`
AND node.`left` BETWEEN sub_parent.`left` AND sub_parent.`right`
AND sub_parent.name = sub_tree.name
GROUP BY node.name
ORDER BY ANY_VALUE(node.`left`);
結果是:
+--------------+-------+
| name | depth |
+--------------+-------+
| Red Hat | 0 |
| Fedora Core | 1 |
| RHEL | 1 |
| CentOS | 2 |
| Oracle Linux | 2 |
+--------------+-------+
尋找一個節點的直接子節點
使用鄰接表模型時這相當簡單。使用嵌套集時,我們可以在上面獲取子樹各節點深度的基礎上增加一個 HAVING
子句來實現:
SELECT
node.name, (COUNT(parent.name) - ANY_VALUE(sub_tree.depth) - 1) AS depth
FROM
nested AS node,
nested AS parent,
nested AS sub_parent,
(
SELECT
node.name, (COUNT(parent.name) - 1) AS depth
FROM
nested AS node,
nested AS parent
WHERE
node.`left` BETWEEN parent.`left` AND parent.`right`
AND node.name = ‘Red Hat‘
GROUP BY node.name, node.`left`
ORDER BY node.`left`
) AS sub_tree
WHERE
node.`left` BETWEEN parent.`left` AND parent.`right`
AND node.`left` BETWEEN sub_parent.`left` AND sub_parent.`right`
AND sub_parent.name = sub_tree.name
GROUP BY node.name
HAVING depth = 1
ORDER BY ANY_VALUE(node.`left`);
結果:
+-------------+-------+
| name | depth |
+-------------+-------+
| Fedora Core | 1 |
| RHEL | 1 |
+-------------+-------+
獲取所有葉節點
觀察帶編號的嵌套模型,葉節點的判斷相當簡單,右編號恰好比左編號多 1 的節點就是葉節點:
SELECT id, name FROM nested WHERE `right` = `left` + 1;
結果如下:
+----+--------------+
| id | name |
+----+--------------+
| 3 | Knoppix |
| 4 | Ubuntu |
| 5 | Gentoo |
| 7 | Fedora Core |
| 9 | CentOS |
| 10 | Oracle Linux |
+----+--------------+
獲取單個節點的完整路徑
仍然是使用 self-join 技巧,不過現在無需顧慮節點的深度了:
SELECT parent.name
FROM
nested AS node,
nested AS parent
WHERE
node.`left` BETWEEN parent.`left` AND parent.`right`
AND node.name = ‘CentOS‘
ORDER BY parent.`left`;
結果如下:
+---------+
| name |
+---------+
| Linux |
| Red Hat |
| RHEL |
| CentOS |
+---------+
聚集操作
我們添加一張 releases
表,來展示一下在嵌套集模型下的聚集(aggregate)操作:
CREATE TABLE releases (
id INT NOT NULL AUTO_INCREMENT,
distribution_id INT NULL,
name VARCHAR(32) NOT NULL,
PRIMARY KEY (id),
INDEX distribution_id_idx (distribution_id ASC),
CONSTRAINT distribution_id
FOREIGN KEY (distribution_id)
REFERENCES nested (id)
ON DELETE CASCADE
ON UPDATE CASCADE
)
ENGINE = InnoDB
DEFAULT CHARACTER SET = utf8;
加入一些數據,假設這些數據是指某個軟件支持的發行版:
INSERT INTO releases (distribution_id, name) VALUES
(2, ‘7‘), (2, ‘8‘),
(4, ‘14.04 LTS‘), (4, ‘15.10‘),
(7, ‘22‘), (7, ‘23‘),
(9, ‘5‘), (9, ‘6‘), (9, ‘7‘);
那麽,下面的查詢可以知道每個節點下涉及的發布版數量,如果這是一個軟件支持的發布版清單,或許測試人員想要知道他們得準備多少種虛擬機吧:
SELECT
parent.name, COUNT(releases.name)
FROM
nested AS node ,
nested AS parent,
releases
WHERE
node.`left` BETWEEN parent.`left` AND parent.`right`
AND node.id = releases.distribution_id
GROUP BY parent.name
ORDER BY ANY_VALUE(parent.`left`);
結果如下:
+-------------+----------------------+
| name | COUNT(releases.name) |
+-------------+----------------------+
| Linux | 9 |
| Debian | 4 |
| Ubuntu | 2 |
| Red Hat | 5 |
| Fedora Core | 2 |
| CentOS | 3 |
+-------------+----------------------+
如果層次結構是一個分類目錄,這個技巧可以用於查詢各個類別下有多少關聯的商品。
添加節點
再次回顧這張圖:
如果我們要在 Gentoo
之後增加一個 Slackware
,這個新節點的左右編號分別是 10 和 11,而原來從 10 開始的所有編號都需要加 2。我們可以:
LOCK TABLE nested WRITE;
SELECT @baseIndex := `right` FROM nested WHERE name = ‘Gentoo‘;
UPDATE nested SET `right` = `right` + 2 WHERE `right` > @baseIndex;
UPDATE nested SET `left` = `left` + 2 WHERE `left` > @baseIndex;
INSERT INTO nested (name, `left`, `right`) VALUES
(‘Slackware‘, @baseIndex + 1, @baseIndex + 2);
UNLOCK TABLES;
使用之前掌握的技巧看一下現在的情況:
SELECT
CONCAT(REPEAT(‘ ‘, COUNT(parent.name) - 1), node.name) AS name
FROM
nested AS node,
nested AS parent
WHERE
node.`left` BETWEEN parent.`left` AND parent.`right`
GROUP BY node.name
ORDER BY ANY_VALUE(node.`left`);
結果為:
+-----------------+
| name |
+-----------------+
| Linux |
| Debian |
| Knoppix |
| Ubuntu |
| Gentoo |
| Slackware |
| Red Hat |
| Fedora Core |
| RHEL |
| CentOS |
| Oracle Linux |
+-----------------+
如果新增的節點的父節點原來是葉節點,我們需要稍微調整一下之前的代碼。例如,我們要新增 Slax
作為 Slackware
的子節點:
LOCK TABLE nested WRITE;
SELECT @baseIndex := `left` FROM nested WHERE name = ‘Slackware‘;
UPDATE nested SET `right` = `right` + 2 WHERE `right` > @baseIndex;
UPDATE nested SET `left` = `left` + 2 WHERE `left` > @baseIndex;
INSERT INTO nested(name, `left`, `right`) VALUES (‘Slax‘, @baseIndex + 1, @baseIndex + 2);
UNLOCK TABLES;
現在,數據形如:
+-----------------+
| name |
+-----------------+
| Linux |
| Debian |
| Knoppix |
| Ubuntu |
| Gentoo |
| Slackware |
| Slax |
| Red Hat |
| Fedora Core |
| RHEL |
| CentOS |
| Oracle Linux |
+-----------------+
刪除節點
刪除節點的操作與添加操作相對,當要刪除一個葉節點時,移除該節點並將所有比該節點右編碼大的編碼減 2。這個思路可以擴展到刪除一個節點及其所有子節點的情況,刪除左編碼介於節點左右編號之間的所有節點,將右側的節點編號全部左移該節點原編號寬度即可:
LOCK TABLE nested WRITE;
SELECT
@nodeLeft := `left`,
@nodeRight := `right`,
@nodeWidth := `right` - `left` + 1
FROM nested
WHERE name = ‘Slackware‘;
DELETE FROM nested WHERE `left` BETWEEN @nodeLeft AND @nodeRight;
UPDATE nested SET `right` = `right` - @nodeWidth WHERE `right` > @nodeRight;
UPDATE nested SET `left` = `left` - @nodeWidth WHERE `left` > @nodeRight;
UNLOCK TABLES;
可以看到 Slackware
子樹被刪除了:
+-----------------+
| name |
+-----------------+
| Linux |
| Debian |
| Knoppix |
| Ubuntu |
| Gentoo |
| Red Hat |
| Fedora Core |
| RHEL |
| CentOS |
| Oracle Linux |
+-----------------+
稍加調整,如果對介於要刪除節點左右編號直接的節點對應編號左移 1,右側節點對應編號左移 2,則可以實現刪除一個節點,其子節點提升一層的效果,例如我們嘗試刪除 RHEL
但保留它的子節點:
LOCK TABLE nested WRITE;
SELECT
@nodeLeft := `left`,
@nodeRight := `right`
FROM nested
WHERE name = ‘RHEL‘;
DELETE FROM nested WHERE `left` = @nodeLeft;
UPDATE nested SET `right` = `right` - 1, `left` = `left` - 1 WHERE `left` BETWEEN @nodeLeft AND @nodeRight;
UPDATE nested SET `right` = `right` - 2 WHERE `right` > @nodeRight;
UPDATE nested SET `left` = `left` - 2 WHERE `left` > @nodeRight;
UNLOCK TABLES;
結果為:
SELECT
CONCAT(REPEAT(‘ ‘, COUNT(parent.name) - 1), node.name) AS name
FROM
nested AS node,
nested AS parent
WHERE
node.`left` BETWEEN parent.`left` AND parent.`right`
GROUP BY node.name
ORDER BY ANY_VALUE(node.`left`);
+----------------+
| name |
+----------------+
| Linux |
| Debian |
| Knoppix |
| Ubuntu |
| Gentoo |
| Red Hat |
| Fedora Core |
| CentOS |
| Oracle Linux |
+----------------+
使用 MySQL 管理層次結構的數據