PB級資料實現秒級查詢ES的安裝

什麼是ES？ElasticSearch是一個基於Lucene的搜尋伺服器。它提供了一個分散式多使用者能力的全文搜尋引擎，基於RESTful web介面。Elasticsearch是用Java語言開發的，並作為Apache許可條款下的開放原始碼釋出，是一種流行的企業級搜尋引擎。ElasticSearch用於雲端計算中，能夠達到實時搜尋，穩定，可靠，快速，安裝使用方便。

本教程使用軟體為6.4.3版本

下載

ElasticSearch：

https://www.elastic.co/cn/downloads/elasticsearch

Kibana:

https://www.elastic.co/cn/downloads/kibana

上傳至Linux伺服器

略

解壓並修改配置檔案

# 注意：需要使用普通使用者操作ES，不可使用root使用者進行操作。

#使用root使用者，新增使用者
#新增使用者命令：
[root@hadoop137 ~]# useradd shaofei
[root@hadoop137 ~]# passwd shaofei
更改使用者 shaofei的密碼 。
新的 密碼：
無效的密碼： 過於簡單化/系統化
無效的密碼： 過於簡單
重新輸入新的 密碼：
passwd： 所有的身份驗證令牌已經成功更新。

#切換為shaofei使用者：
[root@hadoop137 ~]# su shaofei
[shaofei@hadoop137 root]$ cd
[shaofei@hadoop137 ~]$ 

# 解壓elasticsearch
[shaofei@hadoop137 softwear]$ tar -zxvf elasticsearch-6.4.3.tar.gz -C ../module/
[shaofei@hadoop137 elasticsearch-6.4.3]$ pwd
/opt/module/elasticsearch-6.4.3

# 修改配置檔案
[shaofei@hadoop137 elasticsearch-6.4.3]$ vim config/elasticsearch.yml 
# 注意修改yml檔案時，配置項後面要空格

elasticsearch.yml



# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
# 修改叢集名稱
cluster.name: myes
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
# 修改當前節點名稱——下一步分發到其他叢集時需要修改
node.name: hadoop137
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
#path.data: /path/to/data
#
# Path to log files:
#
#path.logs: /path/to/logs
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
# 當前節點的ip地址，我這裡配置了hosts檔案所以寫了主機名稱——下一步分發之後需要修改為對應主機的ip地址
network.host: hadoop137
#
# Set a custom port for HTTP:
# ES對外開放服務的埠號
http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when new node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
#discovery.zen.ping.unicast.hosts: ["host1", "host2"]
#
# Prevent the "split brain" by configuring the majority of nodes (total number of master-eligible nodes / 2 + 1):
#
#discovery.zen.minimum_master_nodes: 
#
# For more information, consult the zen discovery module documentation.
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
#gateway.recover_after_nodes: 3
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true

#新增叢集ip地址
discovery.zen.ping.unicast.hosts: ["192.168.23.137", "192.168.23.138", "192.168.23.139"]

分發各個叢集中的其他主機

[shaofei@hadoop137 module]$ rsync -rvl elasticsearch-6.4.3 shaofei@hadoop138:`pwd`
[shaofei@hadoop137 module]$ rsync -rvl elasticsearch-6.4.3 shaofei@hadoop139:`pwd`

分別啟動叢集中的ES

[shaofei@hadoop137 elasticsearch-6.4.3]$ ./bin/elasticsearch

[shaofei@hadoop138 elasticsearch-6.4.3]$ ./bin/elasticsearch

[shaofei@hadoop139 elasticsearch-6.4.3]$ ./bin/elasticsearch

啟動中遇到的問題：

ERROR: [4] bootstrap checks failed
[1]: max file descriptors [4096] for elasticsearch process is too low, increase to at least [65536]
[2]: max number of threads [1024] for user [shaofei] is too low, increase to at least [4096]
[3]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]

對於第【1】個問題：
[root@hadoop139 shaofei]# vi /etc/security/limits.conf
末尾新增：
# End of file

* soft nofile 65536

* hard nofile 131072

* soft nproc 2048

* hard nproc 4096

對於第【2】個問題：
[root@hadoop139 shaofei]# vim /etc/security/limits.d/90-nproc.conf 
修改：
[root@hadoop139 shaofei]# cat /etc/security/limits.d/90-nproc.conf 
# Default limit for number of user's processes to prevent
# accidental fork bombs.
# See rhbz #432903 for reasoning.


*          soft    nproc     10240
root       soft    nproc     unlimited


# 重新登陸使用者即可

對於第【3】個問題：
vi /etc/sysctl.conf
新增下面配置：
vm.max_map_count=655360

並執行命令：
sysctl -p

java.lang.UnsupportedOperationException: seccomp unavailable: CONFIG_SECCOMP not compiled into kernel, CONFIG_SECCOMP and CONFIG_SECCOMP_FILTER are needed

# 修改elasticsearch.yml 新增一下內容

bootstrap.memory_lock: false
bootstrap.system_call_filter: false

問題解決之後重新啟動es即可。

啟動成功

訪問：http://ip:9200

安裝kibana

下載安裝包
解壓
修改配置檔案

kibana.yml


# Kibana is served by a back end server. This setting specifies the port to use.
# 配置埠
server.port: 5601

# Specifies the address to which the Kibana server will bind. IP addresses and host names are both valid values.
# The default is 'localhost', which usually means remote machines will not be able to connect.
# To allow connections from remote users, set this parameter to a non-loopback address.
# 配置主機ip
server.host: "hadoop137"

# Enables you to specify a path to mount Kibana at if you are running behind a proxy.
# Use the `server.rewriteBasePath` setting to tell Kibana if it should remove the basePath
# from requests it receives, and to prevent a deprecation warning at startup.
# This setting cannot end in a slash.
#server.basePath: ""

# Specifies whether Kibana should rewrite requests that are prefixed with
# `server.basePath` or require that they are rewritten by your reverse proxy.
# This setting was effectively always `false` before Kibana 6.3 and will
# default to `true` starting in Kibana 7.0.
#server.rewriteBasePath: false

# The maximum payload size in bytes for incoming server requests.
#server.maxPayloadBytes: 1048576

# The Kibana server's name.  This is used for display purposes.
#server.name: "your-hostname"

# The URL of the Elasticsearch instance to use for all your queries.
#配置elasticsearch的訪問
elasticsearch.url: "http://hadoop137:9200"

# When this setting's value is true Kibana uses the hostname specified in the server.host
# setting. When the value of this setting is false, Kibana uses the hostname of the host
# that connects to this Kibana instance.
#elasticsearch.preserveHost: true

# Kibana uses an index in Elasticsearch to store saved searches, visualizations and
# dashboards. Kibana creates a new index if the index doesn't already exist.
#kibana.index: ".kibana"

# The default application to load.
#kibana.defaultAppId: "home"

# If your Elasticsearch is protected with basic authentication, these settings provide
# the username and password that the Kibana server uses to perform maintenance on the Kibana
# index at startup. Your Kibana users still need to authenticate with Elasticsearch, which
# is proxied through the Kibana server.
#elasticsearch.username: "user"
#elasticsearch.password: "pass"

# Enables SSL and paths to the PEM-format SSL certificate and SSL key files, respectively.
# These settings enable SSL for outgoing requests from the Kibana server to the browser.
#server.ssl.enabled: false
#server.ssl.certificate: /path/to/your/server.crt
#server.ssl.key: /path/to/your/server.key

# Optional settings that provide the paths to the PEM-format SSL certificate and key files.
# These files validate that your Elasticsearch backend uses the same key files.
#elasticsearch.ssl.certificate: /path/to/your/client.crt
#elasticsearch.ssl.key: /path/to/your/client.key

# Optional setting that enables you to specify a path to the PEM file for the certificate
# authority for your Elasticsearch instance.
#elasticsearch.ssl.certificateAuthorities: [ "/path/to/your/CA.pem" ]

# To disregard the validity of SSL certificates, change this setting's value to 'none'.
#elasticsearch.ssl.verificationMode: full

# Time in milliseconds to wait for Elasticsearch to respond to pings. Defaults to the value of
# the elasticsearch.requestTimeout setting.
#elasticsearch.pingTimeout: 1500

# Time in milliseconds to wait for responses from the back end or Elasticsearch. This value
# must be a positive integer.
#elasticsearch.requestTimeout: 30000

# List of Kibana client-side headers to send to Elasticsearch. To send *no* client-side
# headers, set this value to [] (an empty list).
#elasticsearch.requestHeadersWhitelist: [ authorization ]

# Header names and values that are sent to Elasticsearch. Any custom headers cannot be overwritten
# by client-side headers, regardless of the elasticsearch.requestHeadersWhitelist configuration.
#elasticsearch.customHeaders: {}

# Time in milliseconds for Elasticsearch to wait for responses from shards. Set to 0 to disable.
#elasticsearch.shardTimeout: 30000

# Time in milliseconds to wait for Elasticsearch at Kibana startup before retrying.
#elasticsearch.startupTimeout: 5000

# Logs queries sent to Elasticsearch. Requires logging.verbose set to true.
#elasticsearch.logQueries: false

# Specifies the path where Kibana creates the process ID file.
#pid.file: /var/run/kibana.pid

# Enables you specify a file where Kibana stores log output.
#logging.dest: stdout

# Set the value of this setting to true to suppress all logging output.
#logging.silent: false

# Set the value of this setting to true to suppress all logging output other than error messages.
#logging.quiet: false

# Set the value of this setting to true to log all events, including system usage information
# and all requests.
#logging.verbose: false

# Set the interval in milliseconds to sample system and process performance
# metrics. Minimum is 100ms. Defaults to 5000.
#ops.interval: 5000

# The default locale. This locale can be used in certain circumstances to substitute any missing
# translations.
#i18n.defaultLocale: "en"

瀏覽器訪問： http://hadoop137:5601/ 即可驗證是否成功

PB級資料實現秒級查詢ES的安裝

什麼是ES？ElasticSearch是一個基於Lucene的搜尋伺服器。它提供了一個分散式多使用者能力的全文搜尋引擎，基於RESTful web介面。Elasticsearch是用Java語言開發的，並作為Apache許可條款下的開放原始碼釋出，是一種流行的企業級搜尋引擎。ElasticSearch用於雲

python +ip2region IP庫地址文件實現秒級查詢1萬不同ip歸屬地址

python +ip2region一、服務器環境介紹：服務器硬件：4核4g內存服務器系統：centos6.9 x86_64位最小化安裝二、環境安裝參考地址：https://github.com/lionsoul2014/ip2region直接下載包到服務器上的/root目錄下wget https://g

阿里如何實現秒級百萬TPS？搜尋離線大資料平臺架構解讀

什麼是搜尋離線？一個典型的商品搜尋架構如下圖所示，本文將要重點介紹的就是下圖中的離線資料處理系統（Offline System）。何謂離線？在阿里搜尋工程體系中我們把搜尋引擎、線上算分、SearchPlanner等ms級響應使用者請求的服務稱之為“

億級資料多條件組合查詢——秒級響應解決方案

1 概述組合查詢為多條件組合查詢，在很多場景下都有使用。購物網站中通過勾選類別、價格、銷售量範圍等屬性來對所有的商品進行篩選，篩選出滿足客戶需要的商品，這是一種典型的組合查詢。在小資料量的情況下，後臺通過簡單的sql語句便能夠快速過濾出需要的資料，但隨著資料量

每天數百億使用者行為資料，美團點評怎麼實現秒級轉化分析？

使用者行為分析是資料分析中非常重要的一項內容，在統計活躍使用者，分析留存和轉化率，改進產品體驗、推動使用者增長等領域有重要作用。美團點評每天收集的使用者行為日誌達到數百億條，如何在海量資料集上實現對使用者行為的快速靈活分析，成為一個巨大的挑戰。為此，我們提出並實現了一套面向海

【資料案例】每天數百億使用者行為資料，美團點評怎麼實現秒級轉化分析？

6. 效果：上述方案目前在美團點評內部已經實際落地，穩定執行超過半年以上。每天的資料有幾百億條，活躍使用者達到了上億的量級，埋點屬性超過了百萬，日均查詢量幾百次，單次查詢的TP95時間小於5秒，完全能夠滿足互動式分析的預期。相比於原有sql方案，達到了3-4個數量級的效能提升。

實現毫秒級和納秒級計數的幾個API--timeGetTime、GetTickCount、QueryPerformanceCounter

Private Declare Function timeGetTime Lib "winmm.dll" () As Long Private Declare Function GetTickCount Lib "kernel32" () As Long Private Declar

MySQL千萬級資料分割槽儲存及查詢優化

本文轉載自：https://www.cnblogs.com/javaIOException/p/7524945.html 作為傳統的關係型資料庫，MySQL因其體積小、速度快、總體擁有成本低受到中小企業的熱捧，但是對於大資料量(百萬級以上)的操作顯得有些力不從心，這裡我結合之前開發的一個web系

身份證、駕駛證ocr識別助力共享租車實現秒級認證

目前，國內P2P租車領軍品牌凹凸共享租車完成了其APP版本的升級，應用了身份證、駕駛證OCR識別技術，租客身份認證開啟懶人模式，拍照自動轉文字，達到秒級驗證；車主上傳車輛開啟傻瓜模式，雲端識別車型資訊，將使用流程效率提升至業內領先水準。網際網路的本質是提升效率，身份驗證是使用者使用AP

crontab實現秒級的計劃任務

開啟crontab的配置檔案： [[email protected] ~]# cat /etc/crontab # Example of job definition: # .---------------- 分鐘(0 - 59) # | .

Linux下實現秒級定時任務的兩種方案（crontab 每秒執行）

第一種方案，當然是寫一個後臺執行的指令碼一直迴圈，然後每次迴圈sleep一段時間。 while true ;do command sleep XX //間隔秒數 done 第二種方案，使用crontab。我們都知道crontab的粒度最小是到分鐘，但是我們還是可以通過變

[開源]CSharpFlink（NET 5.0開發）分散式實時計算框架，PC機10萬資料點秒級計算測試說明

github地址：https://github.com/wxzz/CSharpFlinkgitee地址：https://gitee.com/wxzz/CSharpFlink 參考：[開源地址] 放棄Flink，.NET5.0開發CSharpFlink，簡要設計、部署及二次開發說明。

使用 tke-autoscaling-placeholder 實現秒級彈性伸縮

## 背景當 TKE 叢集配置了節點池並啟用了彈性伸縮，在節點資源不夠時可以觸發節點的自動擴容 (自動買機器並加入叢集)，但這個擴容流程需要一定的時間才能完成，在一些流量突高的場景，這個擴容速度可能會顯得太慢，影響業務。 `tke-autoscaling-placeholder` 可以用於在 TKE 上實