1. 程式人生 > >MySQL全文索引功能

MySQL全文索引功能

官網地址:https://dev.mysql.com/doc/refman/5.7/en/fulltext-search.html

說明

  1. 簡介
    1).MySQL中的全文索引是FultLeXT型別的索引。
    2).全文索引只能用於InnoDB或MyISAM表,只能為CHAR、VARCHAR或文字列建立。
    3).在MySQL 5.7.6中,MySQL提供了支援中文、日文和韓文(CJK)的內建全文ngram解析器,以及用於日文的可安裝MeCab全文解析器外掛
    4).當建立表時,可以在CREATE TABLE語句中給出FULLTEXT索引定義,或者稍後使用ALTER TABLE或CREATE INDEX新增該定義。
    5).對於大型資料集,將資料載入到沒有FULLTEXT索引的表中然後建立索引要比將資料載入到具有現有FULLTEXT索引的表中快得多。

  2. 查詢語法結構

MATCH (col1,col2,...) AGAINST (expr [search_modifier])
search_modifier:
  {
       IN NATURAL LANGUAGE MODE
     | IN NATURAL LANGUAGE MODE WITH QUERY EXPANSION
     | IN BOOLEAN MODE
     | WITH QUERY EXPANSION
  }

3.全文索引的三種類型

  1. 自然語言搜尋將搜尋字串解釋為自然語言中短語。
  2. 布林全文搜尋
  3. 查詢擴充套件搜尋

自然語言全文索引

例子1,簡單使用

CREATE SCHEMA `fulltextsearches` DEFAULT CHARACTER SET utf8 ;
mysql> CREATE TABLE articles (
          id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
          title VARCHAR(200),
          body TEXT,
          FULLTEXT (title,body)
        ) ENGINE=InnoDB;
Query OK, 0 rows affected (0.08
sec) mysql> INSERT INTO articles (title,body) VALUES ('MySQL Tutorial','DBMS stands for DataBase ...'), ('How To Use MySQL Well','After you went through a ...'), ('Optimizing MySQL','In this tutorial we will show ...'), ('1001 MySQL Tricks','1. Never run mysqld as root. 2. ...'), ('MySQL vs. YourSQL','In the following database comparison ...'), ('MySQL Security','When configured properly, MySQL ...'); Query OK, 6 rows affected (0.01 sec) Records: 6 Duplicates: 0 Warnings: 0 mysql> SELECT * FROM articles WHERE MATCH (title,body) AGAINST ('database' IN NATURAL LANGUAGE MODE); +----+-------------------+------------------------------------------+ | id | title | body | +----+-------------------+------------------------------------------+ | 1 | MySQL Tutorial | DBMS stands for DataBase ... | | 5 | MySQL vs. YourSQL | In the following database comparison ... | +----+-------------------+------------------------------------------+ 2 rows in set (0.00 sec) SELECT COUNT(*) FROM articles WHERE MATCH (title,body) AGAINST ('database' IN NATURAL LANGUAGE MODE); +----------+ | COUNT(*) | +----------+ | 2 | +----------+ 1 row in set (0.00 sec)

說明:
對於自然語言全文搜尋,MATCH()函式中命名的列必須與表中一些FULLTEXT索引中包括的列相同。對於前面的查詢,請注意,MATCH()函式中命名的列(title和body)與文章表的FULLTEXT索引的定義中命名的列相同。要分別搜尋標題或正文,您將為每個列建立單獨的全文索引。
例子2:演示如何顯式檢索相關值

SELECT id, MATCH (title,body)
    AGAINST ('Tutorial' IN NATURAL LANGUAGE MODE) AS score
    FROM articles;
+----+---------------------+
| id | score               |
+----+---------------------+
|  1 | 0.22764469683170319 |
|  2 |                   0 |
|  3 | 0.22764469683170319 |
|  4 |                   0 |
|  5 |                   0 |
|  6 |                   0 |
+----+---------------------+
6 rows in set (0.00 sec)

例子3:
查詢返回相關值,並且按照降低相關性的順序排序行。為了實現這個結果,指定Match()兩次:一次在SELECT列表中,一次在WHERE子句中。這不會導致額外的開銷,因為MySQL優化器注意到兩個MATCH()呼叫是相同的,並且只調用一次全文搜尋程式碼。

SELECT id, body, MATCH (title,body) AGAINST
    ('Security implications of running MySQL as root'
    IN NATURAL LANGUAGE MODE) AS score
    FROM articles WHERE MATCH (title,body) AGAINST
    ('Security implications of running MySQL as root'
    IN NATURAL LANGUAGE MODE);

這裡寫圖片描述

布林全文索引

例子1:簡單使用

SELECT * FROM articles WHERE MATCH (title,body)
    AGAINST ('+MySQL -YourSQL' IN BOOLEAN MODE);
+----+-----------------------+-------------------------------------+
| id | title                 | body                                |
+----+-----------------------+-------------------------------------+
|  1 | MySQL Tutorial        | DBMS stands for DataBase ...        |
|  2 | How To Use MySQL Well | After you went through a ...        |
|  3 | Optimizing MySQL      | In this tutorial we will show ...   |
|  4 | 1001 MySQL Tricks     | 1. Never run mysqld as root. 2. ... |
|  6 | MySQL Security        | When configured properly, MySQL ... |
+----+-----------------------+-------------------------------------+

支援的運算子

  1. +:表示該單詞必須出現在返回的每一行中。(字首或者字尾,但InnoDB只能放到前面)
  2. -:表示該單詞不能出現在返回的任何行中。(字首或者字尾,但InnoDB只能放到前面)
  3. no operator:該單詞是可選的,但包含它的行評分較高。
  4. @distance:僅InnoDB支援,測試兩個或者兩個以上的單詞是不是都一定距離開始, for example, MATCH(col1) AGAINST(‘“word1 word2 word3” @8’ IN BOOLEAN MODE)
  5. > <:這兩個運算子用於改變單詞對分配給行的相關值的貢獻。>操作符增加貢獻,<操作符減少它
  6. ( ):括號將單詞分組成子表示式。括號組可以巢狀。
  7. ~:單詞對當前的匹配行貢獻是負的
  8. *:單詞匹配萬用字元,但也受InnoDB表的innodb_ft_min_token_size設定或MyISAM表的ft_min_word_len的影響。
  9. “:匹配按字面意義包含該短語的行

相關性計算

TF-IDF 公式
這裡寫圖片描述

mysql> CREATE TABLE articles (
id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
title VARCHAR(200),
body TEXT,
FULLTEXT (title,body)
) ENGINE=InnoDB;
Query OK, 0 rows affected (1.04 sec)

mysql> INSERT INTO articles (title,body) VALUES
('MySQL Tutorial','This database tutorial ...'),
("How To Use MySQL",'After you went through a ...'),
('Optimizing Your Database','In this database tutorial ...'),
('MySQL vs. YourSQL','When comparing databases ...'),
('MySQL Security','When configured properly, MySQL ...'),
('Database, Database, Database','database database database'),
('1001 MySQL Tricks','1. Never run mysqld as root. 2. ...'),
('MySQL Full-Text Indexes', 'MySQL fulltext indexes use a ..');                  
Query OK, 8 rows affected (0.06 sec)
Records: 8  Duplicates: 0  Warnings: 0

mysql> SELECT id, title, body, MATCH (title,body)  AGAINST ('database' IN BOOLEAN MODE)
AS score FROM articles ORDER BY score DESC;
+----+------------------------------+-------------------------------------+---------------------+
| id | title                        | body                                | score               |
+----+------------------------------+-------------------------------------+---------------------+
|  6 | Database, Database, Database | database database database          |  1.0886961221694946 |
|  3 | Optimizing Your Database     | In this database tutorial ...       | 0.36289870738983154 |
|  1 | MySQL Tutorial               | This database tutorial ...          | 0.18144935369491577 |
|  2 | How To Use MySQL             | After you went through a ...        |                   0 |
|  4 | MySQL vs. YourSQL            | When comparing databases ...        |                   0 |
|  5 | MySQL Security               | When configured properly, MySQL ... |                   0 |
|  7 | 1001 MySQL Tricks            | 1. Never run mysqld as root. 2. ... |                   0 |
|  8 | MySQL Full-Text Indexes      | MySQL fulltext indexes use a ..     |                   0 |
+----+------------------------------+-------------------------------------+---------------------+
8 rows in set (0.00 sec)

結果:
共有8個記錄,其中3個匹配“資料庫”搜尋項。第一記錄(ID 6)包含搜尋項6次,並且具有1.0886961221694946的相關性排序。使用TF值6(在記錄id 6中“資料庫”搜尋項出現6次)和IDF值0.42596873216370745(其中8是記錄的總數,3是搜尋項出現的記錄數)來計算該排名值:

${IDF} = log10( 8 / 3 ) = 0.42596873216370745

${rank} = ${TF} * ${IDF} * ${IDF}

mysql> SELECT 6*log10(8/3)*log10(8/3);
+-------------------------+
| 6*log10(8/3)*log10(8/3) |
+-------------------------+
|       1.088696164686938 |
+-------------------------+
1 row in set (0.00 sec)

這裡寫圖片描述

擴充套件查詢

當搜尋短語太短時,這通常很有用,這通常意味著使用者依賴於全文搜尋引擎缺乏的隱含知識。例如,搜尋“database”的使用者可能真的意味著“MySQL”、“Oracle”、“DB2”和“RDBMS”都是應該與“database”匹配並且也應該返回的短語。

例子:

mysql> SELECT * FROM articles
    WHERE MATCH (title,body)
    AGAINST ('database' IN NATURAL LANGUAGE MODE);
+----+-------------------+------------------------------------------+
| id | title             | body                                     |
+----+-------------------+------------------------------------------+
|  1 | MySQL Tutorial    | DBMS stands for DataBase ...             |
|  5 | MySQL vs. YourSQL | In the following database comparison ... |
+----+-------------------+------------------------------------------+
2 rows in set (0.00 sec)

mysql> SELECT * FROM articles
    WHERE MATCH (title,body)
    AGAINST ('database' WITH QUERY EXPANSION);
+----+-----------------------+------------------------------------------+
| id | title                 | body                                     |
+----+-----------------------+------------------------------------------+
|  5 | MySQL vs. YourSQL     | In the following database comparison ... |
|  1 | MySQL Tutorial        | DBMS stands for DataBase ...             |
|  3 | Optimizing MySQL      | In this tutorial we will show ...        |
|  6 | MySQL Security        | When configured properly, MySQL ...      |
|  2 | How To Use MySQL Well | After you went through a ...             |
|  4 | 1001 MySQL Tricks     | 1. Never run mysqld as root. 2. ...      |
+----+-----------------------+------------------------------------------+
6 rows in set (0.00 sec)