elasticsearch ES使用文件

阿新 • • 發佈：2020-09-08

1. 先說需求

有一批醫療資料，需要搭建搜尋引擎資料庫，按照之前的管理，我優先選擇了python的whoosh，畢竟對自己熟悉的東西會最先使用

同時，對ES不是特別瞭解，用whoosh搭建了資料庫

問題:

由於資料有幾個G，資料量巨大，導致whoosh在用的時候，記憶體溢位，MemoryError。故此，我決定改用ES

2. ES使用文件

參考：
es文件
搭建
https://blog.csdn.net/zhezhebie/article/details/105482149
https://www.jianshu.com/p/da3c3612686a
下載
https://elasticsearch.cn/download/ 

使用
https://blog.csdn.net/diyiday/article/details/82153780


配置檔案
https://www.cnblogs.com/hanyouchun/p/5163183.html
檔案位置：
/etc/elasticsearch/elasticsearch.yml


建立索引有問題：
400, 'mapper_parsing_exception', 'Root mapping definition has unsupported parameters:
解決方案：
https://blog.csdn.net/h_sn9999/article/details/102767040

統計總資料量
https: 
//blog.csdn.net/whq12789/article/details/101062968

下載ES，這裡我選擇了最新版的

本地的下載速度比伺服器下載的還快，需要等很久，我等了1h
https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.9.0-x86_64.rpm

搭建rmp，我選擇直接在伺服器直接搭建

rpm -Uvh /路徑/es_64位.rpm

systemctl enable elasticsearch  開機自啟動
systemctl start elasticsearch 啟動
systemctl status elasticsearch 啟動
 

檢視日誌
/var/log/elas.../elas....log 日誌檔案

修改配置檔案

ES拒絕你連線，怎麼辦，修改配置檔案

ES預設埠

9300埠： ES節點之間通訊使用

9200埠： ES節點 和 外部 通訊使用

 

9300是TCP協議埠號，ES叢集之間通訊埠號

9200埠號，暴露ES RESTful介面埠號

修改配置檔案

node.name: node-1

cluster.initial_master_nodes: ["node-1"]

path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 0.0.0.0
http.port: 5000
discovery.seed_hosts: ["127.0.0.1"]

陳總的配置檔案

修改ES host主機報錯

看日誌

atleastoneof[discovery.seed_hosts,discovery.seed_providers,cluster.initial_master_nodes]mustbeconfigured

按照上面配置檔案修改就可以解決

使用ES

#!/usr/bin/env python
# -*- coding:utf-8 -*-

from elasticsearch import Elasticsearch

es = Elasticsearch(["ip:5000"])

# journal-ref, report-no
# journal_ref, report_no
# columnName = ['id', 'submitter', 'authors', 'title', 'comments', 'journal_ref', 'doi', 'report_no', 'categories',
#               'license', 'abstract', 'versions', 'update_date', 'authors_parsed']

mappings = {
    'mappings': {
        'type_doc_test': {
            'properties': {
                'id': {
                    'type': 'text',
                },
                'submitter': {
                    'type': 'text',
                },
                'authors': {
                    'type': 'text',
                },
                'title': {
                    'type': 'text',
                },
                'comments': {
                    'type': 'text',
                },
                'journal_ref': {
                    'type': 'text',
                },
                'doi': {
                    'type': 'text',
                },
                'report_no': {
                    'type': 'text',
                },
                'categories': {
                    'type': 'text',
                },
                'license': {
                    'type': 'text',
                },
                'abstract': {
                    'type': 'text',
                },
                'versions': {
                    'type': 'text',
                },
                'update_date': {
                    'type': 'text',
                },
                'authors_parsed': {
                    'type': 'text',
                }
            }
        }
    }
}

mappings_1 = {　　　　　　# 這個是7版本的方法，上面那個是6版本的方法
    'mappings': {
        'properties': {
            'id': {
                'type': 'text',
            },
            'submitter': {
                'type': 'text',
            },
            'authors': {
                'type': 'text',
            },
            'title': {
                'type': 'text',
            },
            'comments': {
                'type': 'text',
            },
            'journal_ref': {
                'type': 'text',
            },
            'doi': {
                'type': 'text',
            },
            'report_no': {
                'type': 'text',
            },
            'categories': {
                'type': 'text',
            },
            'license': {
                'type': 'text',
            },
            'abstract': {
                'type': 'text',
            },
            'versions': {
                'type': 'text',
            },
            'update_date': {
                'type': 'text',
            },
            'authors_parsed': {
                'type': 'text',
            }
        }
    }
}

res = es.indices.create(index="index_test", body=mappings_1)

具體可以參考連結：

Root mapping definition has unsupported parameters:  [product : {properties={title={type=text}}}

https://blog.csdn.net/h_sn9999/article/details/102767040

寫入資料

#!/usr/bin/env python
# -*- coding:utf-8 -*-

# 寫入索引資料
from decimal import Decimal
import pymysql, json
from elasticsearch import Elasticsearch


def insert_es_data():
    es = Elasticsearch(["ip:5000"])

    file_path = r"D:\files\612177_1419905_compressed_arxiv-metadata-oai-snapshot-2020-08-14/"
    file_name = r"arxiv-metadata-oai-snapshot-2020-08-14.json"
    file_path_name = file_path + file_name

    with open(file_path_name, "r", encoding='UTF-8') as f:
        for action in f.readlines():

            action = json.loads(action)
            action["journal_ref"] = action["journal-ref"]
            del action["journal-ref"]

            action["report_no"] = action["report-no"]
            del action["report-no"]

            for key in action:
                val_ = action[key]
                if not val_:
                    val_ = ""
                elif isinstance(val_, (Decimal,)):
                    val_ = str(val_)
                else:
                    val_ = pymysql.escape_string(json.dumps(val_))
                action[key] = val_

            es.index(index="index_test", body=action)

刪除資料

from elasticsearch import Elasticsearch

es = Elasticsearch(["ip:5000"])

res = es.delete(index="index_test", id ="oClia3QBQ2tDmCR81pYz")
print(res)

查詢資料

from elasticsearch import Elasticsearch

es = Elasticsearch(["ip:5000"])

doc = {
            "query": {
                "match": {
                    "comments": "published"
                }
            }
        }

import time
a = time.time()
res = es.search(index="index_test", body=doc)
print(res)
print(time.time() - a)

ElasticSearch全文搜尋引擎（二）-Spring Boot操作ES（SpringData概述、Spring Data Elasticsearch、基本操作、ElasticSearch操作文件）

1 Spring Data概述　　Spring Data是spring提供的一套連線各種第三方資料來源的框架集，它支援連線很多第三方資料來源，例如：

基於open_distro的ES文件訪問控制

基於open_distro的ES文件訪問控制背景 open distro for elasticsearch 是由亞馬遜AWS支援的基於Apache License,Version 2.0協議的100%開源的Elasticsearch發行版。與Elastic公司官方的Elasticsearch版本最大的區別是

Elasticsearch之文件操作

Elasticsearch之文件操作這是es系列的第五篇文章了，閱讀前四篇有助於小夥伴們理解本篇文章。

Elasticsearch+logstash文件

ELK安裝部署 ELK安裝部署 ELK是elastic公司提供的一套完整的收集日誌並分析展示的產品，分別表示Elasticsearch、Logstash和kibana。

Elasticsearch的文件、索引。（三）

elasticsearch。面向文件在應用程式中物件很少只是一個簡單的鍵和值的列表。通常，它們擁有更復雜的資料結構，可能包括日期、地理資訊、其他物件或者陣列等。

Elasticsearch之-文件操作

一新增文件 #新增一個id為1的書籍（POST和PUT都可以） POST lqz/_doc/1/_create #POST lqz/_doc/1

DAY 116 ES文件操作

Elasticsearch之-文件操作一新增文件 #新增一個id為1的書籍（POST和PUT都可以）POST lqz/_doc/1/_create#POST lqz/_doc/1#POST lqz/_doc 會自動建立id,必須用Post{ \"title\":\"紅樓夢\", \"price\":12, \"publish