ES--二級索引

阿新 • • 發佈：2021-01-10

技術標籤：ELK elasticsearch 二級索引

二級索引

1、應用場景
2、需求分析
- 實現
- 流程
3、程式碼實現
二級索引Maven依賴

1、應用場景

`ES優缺點`

優點：可以構建全文索引，根據需求可以將任意的資料構建索引來查詢
缺點：資料量大，效能不能滿足高實時要求，本身資料安全的隱患相對較高

`Hbase優缺點`

優點：實現大量資料集高效能的實時讀寫，資料相對安全
缺點：rowkey作為唯一索引，複雜業務中，查詢條件肯定是變化多樣的
- 如果查詢條件不是rowkey的字首
- 無法走索引，只能構建二級索引
為什麼不用Hbase中構建二級索引表？
- 只要有一個需求，就需要構建一個Hbase二級索引表
  - 原表：rowkey：id
  - 索引表：rowkey：name
    - id
  - 索引表：rowkey：age
    - id

ES構建索引表

將所有的條件，在ES中儲存構建索引，指向Hbase中rowkey
ES中資料

documentId	age	name		rowkey:id
0	18	zhangsan	male	001

Hbase中資料

rowkey:id	age	name	sex
001	18	zhangsan	male
002	19	lisi	female

避免了，在Hbase中需要構建多張二級索引表

2、需求分析

需求：通過檢索標題中的關鍵字、來源、時間、閱讀次數等條件查詢，獲取文章正文內容
資料

ID
標題
來源
時間
閱讀次數
正文

能不能儲存在ES中？
- 可以的
- 有問題存在
  - 效能或者安全性，沒有必要將所有資料儲存在ES中
  - 如果儲存了大量資料，都構建索引，效能會差

實現

將資料完整的存放在Hbase中
- id作為Rowkey
- 標題、來源、時間、閱讀次數、正文
將需要用到的查詢條件儲存在ES中，將所有條件構建索引
- ID
- 標題
- 來源
- 時間
- 閱讀次數
使用者查詢
- step1：在ES中根據查詢條件，查詢符合條件的資料的ID
step2：再通過ID到Hbase中獲取這篇文章的正文

流程

step1：讀取Excel檔案，解析每條資料，封裝到javaBean中
每一條資料就是一個JavaBean物件
將所有的JavaBean放在一個集合中
step2：將JavaBean資料寫入Hbase和ES
ES：id、標題、來源、時間、閱讀次數
Hbase：所有的欄位都儲存在Hbase

rowkey：id
step3：基於標題來查詢相關資料的正文內容
- 先查詢ES，獲取這個標題相關的ID
- 根據ID到Hbase中進行查詢，返回正文

3、程式碼實現

ES中建立對應的索引庫

PUT /articles
{  
    "settings":{  
         "number_of_shards":3,  
         "number_of_replicas":1,
         "analysis" : {
            "analyzer" : {
                "ik" : {
                    "tokenizer" : "ik_max_word"
                }
            }
        }
    }, 
    "mappings":{  
         "article":{  
             "dynamic":"strict",
             "_source": {
               "includes": [
                  "id","title","from","readCount","time"
                ],
               "excludes": [
                  "content"
               ]
             },
             "properties":{  
                 "id":{"type": "keyword", "store": true},  
                 "title":{"type": "text","store": true,"index" : true,"analyzer": "ik_max_word"}, 
                 "from":{"type": "keyword","store": true}, 
                 "readCount":{"type": "integer","store": true},  
                 "content":{"type": "text","store": false,"index": false},
                 "time": {"type": "keyword", "index": false}
             }  
         }  
    }  
}

Hbase中建立對應的表
- 需要使用root使用者來操作
- 啟動HDFS
  - start-dfs.sh
- 啟動Zookeeper
  - /export/servers/zookeeper-3.4.5-cdh5.14.0/bin/start-zk-all.sh
- 啟動Hbase
  - start-hbase.sh
- 建立表
  - create ‘articles’,‘article’
構建Excel檔案解析類

package cn.hanjiaxiaozhi.util;

import cn.hanjiaxiaozhi.bean.EsArticle;
import org.apache.poi.xssf.usermodel.XSSFRow;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;

import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

/**
 * @ClassName ExcelUtil
 * @Description TODO 用於解析Excel中的資料，封裝成JavaBean工具類
 * @Date 2020/7/5 15:46
 * @Create By     Frank
 */
public class ExcelUtil {

    /**
     * 用於解析Excel檔案的資料
     * @param path
     * @return
     */
    public static List<EsArticle> parseExcelData(String path) throws IOException {
        //構建一個返回值
        List<EsArticle> lists = new ArrayList<>();
        //將檔案構建輸入流
        FileInputStream inputStream = new FileInputStream(path);
        //解析Excel檔案，得到每張表
        XSSFWorkbook sheets = new XSSFWorkbook(inputStream);
        //獲取對應的表格
        XSSFSheet sheet = sheets.getSheetAt(0);
        //先獲取這張表格總行數，然後迭代取每一行
        int lastRowNum = sheet.getLastRowNum();
        //列的名稱，不要，從1開始
        for(int i = 1;i <= lastRowNum ;i++){
            //拿到每一行的內容
            XSSFRow row = sheet.getRow(i);
            //取出每一行的每一列
            String id = row.getCell(0).toString();//id
            String title = row.getCell(1).toString();//標題
            String from = row.getCell(2).toString();//來源
            String time = row.getCell(3).toString();//時間
            String readCount = row.getCell(4).toString();//閱讀次數
            String content = row.getCell(5).toString();//正文內容
            //將每一行封裝成JavaBean物件
            EsArticle esArticle = new EsArticle(id, title, from, time, readCount, content);
            //放入集合
            lists.add(esArticle);
        }

        //返回
        return lists;
    }
}

構建Hbase讀寫工具類

package cn.hanjiaxiaozhi.util;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;
import sun.security.pkcs11.P11Util;

import java.io.IOException;

/**
 * @ClassName HbaseUtil
 * @Description TODO 用於讀寫Hbase
 * @Date 2020/7/5 16:13
 * @Create By     Frank
 */
public class HbaseUtil {


    private static Table getHbaseTable(String tableName) throws IOException {
        //獲取一個Hbase連線
        Configuration conf = HBaseConfiguration.create();
        //指定zookeeper的地址
        conf.set("hbase.zookeeper.quorum","node-01:2181,node-02:2181,node-03:2181");
        Connection conn = ConnectionFactory.createConnection(conf);
        //構建表的物件
        Table table = conn.getTable(TableName.valueOf(tableName));
        return table;
    }

    /**
     *
     * 將資料寫入Hbase
     * @param tableName
     * @param rowkey
     * @param family
     * @param column
     * @param value
     */
    public static void writeToHbase(String tableName,String rowkey,String family,String column,String value) throws IOException {
        //構建Hbase表的物件
        Table table = getHbaseTable(tableName);
        //構建Put物件
        Put put = new Put(Bytes.toBytes(rowkey));
        //配置
        put.addColumn(Bytes.toBytes(family),Bytes.toBytes(column),Bytes.toBytes(value));
        //執行
        table.put(put);
    }

    /**
     * 通過rowkey返回正文的內容
     * @param tableName
     * @param rowkey
     * @param family
     * @param column
     * @return
     * @throws IOException
     */
    public static String readFromHbase(String tableName,String  rowkey,String family,String column) throws IOException {
        //獲取表的物件
        Table table = getHbaseTable(tableName);
        //獲取某個rowkey的資料
        Get get = new Get(Bytes.toBytes(rowkey));
        //執行獲取這個rowkey所有的資料
        Result result = table.get(get);
        //返回content這一列的資料
        byte[] content = result.getValue(Bytes.toBytes(family), Bytes.toBytes(column));
        return  Bytes.toString(content);
    }

}

構建ES讀寫工具類

package cn.hanjiaxiaozhi.util;

import cn.hanjiaxiaozhi.bean.EsArticle;
import com.alibaba.fastjson.JSON;
import org.elasticsearch.action.bulk.BulkRequestBuilder;
import org.elasticsearch.action.index.IndexRequestBuilder;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.transport.TransportAddress;
import org.elasticsearch.common.xcontent.XContentType;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.transport.client.PreBuiltTransportClient;

import java.net.InetAddress;
import java.net.UnknownHostException;
import java.util.ArrayList;
import java.util.List;

/**
 * @ClassName EsUtil
 * @Description TODO 讀寫ES
 * @Date 2020/7/5 16:13
 * @Create By     Frank
 */
public class EsUtil {

    static String indexName = "articles";
    static  String typeName = "article";


    public  static TransportClient getESClient() throws UnknownHostException {
        Settings settings = Settings.builder().put("cluster.name","myes").build();
        TransportClient client = new PreBuiltTransportClient(settings)
                .addTransportAddress(new TransportAddress(InetAddress.getByName("node-01"),9300))
                .addTransportAddress(new TransportAddress(InetAddress.getByName("node-02"),9300))
                .addTransportAddress(new TransportAddress(InetAddress.getByName("node-03"),9300));

        return client;
    }

    /**
     * 將資料寫入ES
     * @param esArticles
     * @throws UnknownHostException
     */
    public static void writeToES(List<EsArticle> esArticles) throws UnknownHostException {
        //獲取一個ES的客戶端
        TransportClient esClient = getESClient();
        //將集合中的每條資料寫入ES
        BulkRequestBuilder bulk = esClient.prepareBulk();
        //迭代
        for (EsArticle esArticle : esArticles) {
            //將每個JavaBean物件變成JsonString
            String jsonString = JSON.toJSONString(esArticle);
            //構建寫入請求
            IndexRequestBuilder requestBuilder = esClient.prepareIndex(indexName, typeName, esArticle.getId()).setSource(jsonString, XContentType.JSON);
            //放入bulk
            bulk.add(requestBuilder);
        }
        //執行bulk
        bulk.get();
    }

    /**
     * 通過搜尋的關鍵詞對標題進行匹配，將符合的資料用java Bean返回
     * @param keyword
     * @return
     */
    public static List<EsArticle> readFromEs(String keyword) throws UnknownHostException {
        //構建一個返回值
        List<EsArticle> lists = new ArrayList<>();
        //獲取客戶端
        TransportClient esClient = getESClient();
        //構建一個查詢器
        SearchResponse title = esClient.prepareSearch(indexName)
                .setTypes(typeName)
                .setQuery(QueryBuilders.termQuery("title", keyword))
                .get();
        //得到 符合條件的資料
        SearchHit[] hits = title.getHits().getHits();
        for (SearchHit hit : hits) {
            //獲取對應的JSON字串
            String sourceAsString = hit.getSourceAsString();
            //轉為JavaBean
            EsArticle esArticle = JSON.parseObject(sourceAsString, EsArticle.class);
            //放入list集合
            lists.add(esArticle);
        }
        //返回
        return lists;
    }
}

構建主程式

package cn.hanjiaxiaozhi.app;

import cn.hanjiaxiaozhi.bean.EsArticle;
import cn.hanjiaxiaozhi.util.EsUtil;
import cn.hanjiaxiaozhi.util.ExcelUtil;
import cn.hanjiaxiaozhi.util.HbaseUtil;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.FileNotFoundException;
import java.io.IOException;
import java.net.UnknownHostException;
import java.util.List;

/**
 * @ClassName TestEsAndHbase
 * @Description TODO 用於實現將Excel中的資料寫入Hbase和ES，通過ES構建二級索引查詢
 *      從ES中根據索引得到Id
 *      根據id到Hbase查詢正文
 * @Date 2020/7/5 15:40
 * @Create By     Frank
 */
public class TestEsAndHbase {

    static String tableName = "articles";
    static String family = "article";

    public static void main(String[] args) throws IOException {
        //todo:1-讀取Excel的資料，將每條資料封裝成JavaBean
        //定義檔案的路徑
        String path = "datas/excel/hbaseEs.xlsx";
        //將檔案中的每一行變成一個javaBean返回
        List<EsArticle> esArticles = ExcelUtil.parseExcelData(path);
//        System.out.println(esArticles);
        //todo:2-將資料寫入Hbase和ES
//        writeData(esArticles);
        //todo:3-實現按照標題查詢
        search("義大利");
    }

    /**
     * 用於根據標題中的關鍵字，返回正文的內容
     * @param keyword
     */
    private static void search(String keyword) throws IOException {
        //根據標題從ES中通過索引匹配返回document
        List<EsArticle> esArticles = EsUtil.readFromEs(keyword);
        //根據返回的ID到Hbase中查詢正文
        for (EsArticle esArticle : esArticles) {
            //獲取符合條件的資料的id
            String id = esArticle.getId();
            //根據id 到hbase中查詢
            String content = HbaseUtil.readFromHbase(tableName, id, family, "content");
            //列印結果
            System.out.println(content);
        }
    }

    private static void writeData(List<EsArticle> esArticles) throws IOException {
        //寫入ES
        EsUtil.writeToES(esArticles);
        //寫入Hbase
        writeDataToHbase(esArticles);
    }

    private static void writeDataToHbase(List<EsArticle> esArticles) throws IOException {
        //取每一條資料，寫入Hbase
        for (EsArticle esArticle : esArticles) {
            HbaseUtil.writeToHbase(tableName,esArticle.getId(),family,"title",esArticle.getTitle());
            HbaseUtil.writeToHbase(tableName,esArticle.getId(),family,"from",esArticle.getFrom());
            HbaseUtil.writeToHbase(tableName,esArticle.getId(),family,"time",esArticle.getTime());
            HbaseUtil.writeToHbase(tableName,esArticle.getId(),family,"readCount",esArticle.getReadCount());
            HbaseUtil.writeToHbase(tableName,esArticle.getId(),family,"content",esArticle.getContent());
        }
    }
}

二級索引Maven依賴

 <!-- 指定倉庫位置，依次為aliyun、cloudera和jboss倉庫 -->
    <repositories>
        <repository>
            <id>aliyun</id>
            <url>http://maven.aliyun.com/nexus/content/groups/public/</url>
        </repository>
        <repository>
            <id>cloudera</id>
            <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
        </repository>
        <repository>
            <id>jboss</id>
            <url>http://repository.jboss.com/nexus/content/groups/public</url>
        </repository>
    </repositories>
    <dependencies>
        <!--es客戶端-->
        <dependency>
            <groupId>org.elasticsearch.client</groupId>
            <artifactId>transport</artifactId>
            <version>6.0.0</version>
        </dependency>
        <!--日誌記錄器-->
        <dependency>
            <groupId>org.apache.logging.log4j</groupId>
            <artifactId>log4j-core</artifactId>
            <version>2.9.1</version>
        </dependency>
        <!--用於解析JSON的工具類-->
        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>fastjson</artifactId>
            <version>1.2.47</version>
        </dependency>
        <!--單元測試-->
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.12</version>
        </dependency>
        <!--用於解析Excel表格的工具類-->
        <dependency>
            <groupId>org.apache.poi</groupId>
            <artifactId>poi-ooxml-schemas</artifactId>
            <version>3.8</version>
        </dependency>
        <dependency>
            <groupId>org.apache.poi</groupId>
            <artifactId>poi-ooxml</artifactId>
            <version>3.8</version>
        </dependency>
        <dependency>
            <groupId>org.apache.poi</groupId>
            <artifactId>poi</artifactId>
            <version>3.8</version>
        </dependency>
        <!--Hbase的依賴-->
        <dependency>
            <groupId>org.apache.hbase</groupId>
            <artifactId>hbase-client</artifactId>
            <version>1.2.0-cdh5.14.0</version>
        </dependency>
    </dependencies>
    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                </configuration>
            </plugin>
        </plugins>
    </build>

ES--二級索引

技術標籤：ELKelasticsearch二級索引二級索引 1、應用場景`ES優缺點``Hbase優缺點`ES構建索引表

MySQL索引的原理，B+樹、聚集索引和二級索引的結構分析

索引是一種用於快速查詢行的資料結構，就像一本書的目錄就是一個索引，如果想在一本書中找到某個主題，一般會先找到對應頁碼。在mysql中，儲存引擎用類似的方法使用索引，先在索引中找到對應值，然後根據匹配的索引記

MySQL InnoDB 二級索引的排序示例詳解

排序問題最近看了極客時間上《MySQL實戰45講》，糾正了一直以來對 InnoDB 二級索引的一個理解不到位，正好把相關內容總結下。

ES 關於索引的API操縱

上程式碼,結合上篇的SpringBoot整合ES之後,來完成一些索引的操作建立測試類,然後執行,通過Head外掛觀察索引的情況變更

「Elasticsearch」ES重建索引怎麼才能做到資料無縫遷移呢？

背景眾所周知，Elasticsearch是⼀個實時的分散式搜尋引擎，為⽤戶提供搜尋服務。當我們決定儲存某種資料，在建立索引的時候就需要將資料結構，即Mapping確定下來，於此同時索引的設定和很多固定配置將不能改變。

記錄一次NFS快照es叢集索引備份

1，環境搭建： es（主）nfs（客戶端） 192.168.72.158 es（從）nfs（服務端） 192.168.72.152

es刪除索引

批量刪配置檔案：action.destructive_requires_name true(不能批量刪除) false(可以批量刪)

MySQL 聚集索引和二級索引

Every InnoDB table has a special index called the clustered index where the data for the rows is stored. Typically, the clustered index is synonymous with the primary key. To get the best performance