1. 程式人生 > >Apache Druid0.15.0安裝方式

Apache Druid0.15.0安裝方式

Druid0.15.0概述

Druid是一個用於大資料實時查詢和分析的高容錯、高效能開源分散式系統,旨在快速處理大規模的資料,並能夠實現快速查詢和分析。尤其是當發生程式碼部署、機器故障以及其他產品系統遇到宕機等情況時,Druid仍能夠保持100%正常執行。建立Druid的最初意圖主要是為了解決查詢延遲問題,Druid提供了以互動方式訪問資料的能力,並權衡了查詢的靈活性和效能而採取了特殊的儲存格式。值得一提的是,Druid0.15開始支援SQL查詢,而在之前的版本是不支援SQL查詢的,只有json才能查詢。

特性

  • 為區域性巢狀資料結構提供列式儲存格式;
  • 為快速過濾做索引;
  • 實時攝取和查詢;
  • 高容錯的分散式體系架構等。

業務場景

  1. 需要互動式聚合和快速探究大量資料時;
  2. 需要實時查詢分析時;
  3. 對資料尤其是大資料進行實時分析時,在溢米大資料應用場景中,以上三個特性和天眼五期需求非常契合,而Druid恰好可與悟空結合實現實時入庫。目前Spark+CarbonData的方式隨著資料量的增加,查詢速度變得緩慢,Druid是一個不錯的替代方案;
  4. 需要一個高可用、高容錯、高效能資料庫時。

1 叢集規劃

  • Master包含Coordinator和Overlord,4核16G*2;
  • data包含Historical和MiddleManager,16核64G*3;
  • query包含Broker和Router,4核16G*1。
1.1 Hadoop配置檔案設定

本次安裝使用HDFS作為儲存,進入3個data節點,/data1/druid/druid-0.15.0/conf/druid/cluster/_common目錄,軟鏈到對應hadoop的配置檔案目錄,此步驟為了識別Hadoop HA模式,否則深度儲存使用HDFS無法識別路徑。

ln -s /usr/hdp/2.6.5.0-292/hadoop/conf hadoop-xml
1.2 jdk1.8安裝,此處省略。
1.3 data節點作為HDFS的datanode,此處省略
1.4 common配置

這個配置可以列印druid系統的執行日誌,方便後續定位問題,檔案路徑和檔名可修改

  1. log4j2.xml配置
<Configuration status="WARN">
    <Properties>
        <Property name="log.path">/data1/druid/log</Property>
    </Properties>
    <Appenders>
        <Console name="Console" target="SYSTEM_OUT">
            <PatternLayout pattern="%d{ISO8601} %p [%t] %c - %m%n"/>
        </Console>
        <File name="log" fileName="${log.path}/one.log" append="false">
            <PatternLayout pattern="[%d{yyyy-MM-dd HH:mm:ss:SSS}] [%p] - %l - %m%n"/>
        </File>
        <RollingFile name="RollingFileInfo" fileName="${log.path}/druid-data.log"
                     filePattern="${log.path}/druid-data-%d{yyyy-MM-dd}-%i.out">
            <ThresholdFilter level="info" onMatch="ACCEPT" onMismatch="DENY"/>
            <PatternLayout pattern="[%d{yyyy-MM-dd HH:mm:ss:SSS}] [%p] - %l - %m%n"/>
            <Policies>
                <TimeBasedTriggeringPolicy modulate="true" interval="1"/>
                <SizeBasedTriggeringPolicy size="100 MB"/>
            </Policies>

        </RollingFile>
    </Appenders>
    <Loggers>
        <Root level="info">
            <AppenderRef ref="Console"/>
            <appender-ref ref="RollingFileInfo"/>
            <appender-ref ref="log"/>
        </Root>
    </Loggers>
</Configuration>

 

  1. common.runtime.properties配置, druid.host改成druid所在機器的hostname,這個配置檔案是全域性的配置檔案,對應的引數有相應的解釋。
druid.extensions.loadList=["druid-kafka-eight", "druid-histogram", "druid-datasketches", "mysql-metadata-storage","druid-hdfs-storage","druid-kafka-extraction-namespace","druid-kafka-indexing-service"]
druid.extensions.directory=/data1/druid/druid-0.15.0/extensions
# If you have a different version of Hadoop, place your Hadoop client jar files in your hadoop-dependencies directory
# and uncomment the line below to point to your directory.
druid.extensions.hadoopDependenciesDir=/data1/druid/druid-0.15.0/hadoop-dependencies


#
# Hostname
#
druid.host=bd-prod-slave06
#
# Logging
# Log all runtime properties on startup. Disable to avoid logging properties on startup:
druid.startup.logging.logProperties=true

#
# Zookeeper
#

druid.zk.service.host=bd-prod-master01:2181,bd-prod-master02:2181,bd-prod-slave01:2181
druid.zk.paths.base=/druid

#
# Metadata storage
#

# For Derby server on your Druid Coordinator (only viable in a cluster with a single Coordinator, no fail-over):
# druid.metadata.storage.type=derby
# druid.metadata.storage.connector.connectURI=jdbc:derby://localhost:1527/var/druid/metadata.db;create=true
# druid.metadata.storage.connector.host=localhost
# druid.metadata.storage.connector.port=1527

# For MySQL (make sure to include the MySQL JDBC driver on the classpath):
druid.metadata.storage.type=mysql
druid.metadata.storage.connector.connectURI=jdbc:mysql://bd-prod-master01:3306/druid?useSSL=false&amp;useUnicode=true&amp;characterEncoding=UTF-8
druid.metadata.storage.connector.user=user
druid.metadata.storage.connector.password=password

# For PostgreSQL:
#druid.metadata.storage.type=postgresql
#druid.metadata.storage.connector.connectURI=jdbc:postgresql://db.example.com:5432/druid
#druid.metadata.storage.connector.user=...
#druid.metadata.storage.connector.password=...

#
# Deep storage
#

# For local disk (only viable in a cluster if this is a network mount):
# druid.storage.type=local
# druid.storage.storageDirectory=var/druid/segments

# For HDFS:
druid.storage.type=hdfs
druid.storage.storageDirectory=hdfs://bd-prod/druid/segments

# For S3:
#druid.storage.type=s3
#druid.storage.bucket=your-bucket
#druid.storage.baseKey=druid/segments
#druid.s3.accessKey=...
#druid.s3.secretKey=...

#
# Indexing service logs
#

# For local disk (only viable in a cluster if this is a network mount):
# druid.indexer.logs.type=file
# druid.indexer.logs.directory=var/druid/indexing-logs

# For HDFS:
druid.indexer.logs.type=hdfs
druid.indexer.logs.directory=hdfs://bd-prod/druid/indexing-logs

# For S3:
#druid.indexer.logs.type=s3
#druid.indexer.logs.s3Bucket=your-bucket
#druid.indexer.logs.s3Prefix=druid/indexing-logs

#
# Service discovery
#

druid.selectors.indexing.serviceName=druid/overlord
druid.selectors.coordinator.serviceName=druid/coordinator

#
# Monitoring
#

druid.monitoring.monitors=["org.apache.druid.java.util.metrics.JvmMonitor"]
druid.emitter=noop
druid.emitter.logging.logLevel=info

# Storage type of double columns
# ommiting this will lead to index double as float at the storage layer

druid.indexing.doubleStorage=double

#
# Security
#
druid.server.hiddenProperties=["druid.s3.accessKey","druid.s3.secretKey","druid.metadata.storage.connector.password"]


#
# SQL
#
druid.sql.enable=true

#
# Lookups
#
druid.lookup.enableLookupSyncOnStartup=false

 

2.data節點

進入data節點,修改相應的druid.host;

2.1 historical

historical主要負責載入已經生成好的資料檔案以提供資料查詢。

 

  1. /data1/druid/druid-0.15.0/conf/druid/cluster/data/historical/jvm.config
-server
-Xms8g
-Xmx8g
-XX:MaxDirectMemorySize=12g
-XX:+ExitOnOutOfMemoryError
-Duser.timezone=UTC+0800
-Dfile.encoding=UTF-8
-Djava.io.tmpdir=/tmp
-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager

 

  1. /data1/druid/druid-0.15.0/conf/druid/cluster/data/historical/runtime.properties
druid.service=druid/historical
druid.plaintextPort=9088
druid.segmentCache.numLoadingThreads=16
# HTTP server threads
druid.server.http.numThreads=60

# Processing threads and buffers
druid.processing.buffer.sizeBytes=500000000
druid.processing.numMergeBuffers=4
druid.processing.numThreads=16
druid.processing.tmpDir=/data1/druid/processing

# Segment storage
druid.segmentCache.locations=[{"path":"/data1/druid/segment-cache","maxSize":300000000000}]
druid.server.maxSize=300000000000

# Query cache
druid.historical.cache.useCache=true
druid.historical.cache.populateCache=true
druid.cache.type=caffeine
druid.cache.sizeInBytes=256000000

 

2.2 middleManager

middleManager主要負責索引服務的工作節點,負責接收Coordinator分配的任務,然後啟動容器完成具體任務。

  1. /data1/druid/druid-0.15.0/conf/druid/cluster/data/middleManager/jvm.config
-server
-Xms128m
-Xmx128m
-XX:+ExitOnOutOfMemoryError
-Duser.timezone=UTC+0800
-Dfile.encoding=UTF-8
-Djava.io.tmpdir=/tmp
-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager

 

  1. /data1/druid/druid-0.15.0/conf/druid/cluster/data/middleManager/runtime.properties
druid.service=druid/middleManager
druid.plaintextPort=8091

# Number of tasks per middleManager
druid.worker.capacity=4

# Task launch parameters
druid.indexer.runner.javaOpts=-server -Xms1g -Xmx1g -XX:MaxDirectMemorySize=1g -Duser.timezone=UTC+0800 -Dfile.encoding=UTF-8 -XX:+ExitOnOutOfMemoryError -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
druid.indexer.task.baseTaskDir=/data1/druid/task

# HTTP server threads
druid.server.http.numThreads=60

# Processing threads and buffers on Peons
druid.indexer.fork.property.druid.processing.numMergeBuffers=2
druid.indexer.fork.property.druid.processing.buffer.sizeBytes=100000000
druid.indexer.fork.property.druid.processing.numThreads=4

# Hadoop indexing
druid.indexer.task.hadoopWorkingPath=/data1/druid/hadoop-tmp

 

2.3 啟動命令
 nohup ./bin/start-cluster-data-server >/dev/null 2>&1 &

3 master節點

進入master節點,修改common的druid.host選項;

3.1 coordinator-overlord

負責Historical節點的資料負載均衡,以及通過規則管理資料生命週期,也是索引服務的主節點,對外負責接收任務請求,對內負責將任務分解並下發到從節點即MiddleManager上。

  1. /data1/druid/druid-0.15.0/conf/druid/cluster/master/coordinator-overlord/jvm.config
-server
-Xms12g
-Xmx12g
-XX:+ExitOnOutOfMemoryError
-XX:+UseG1GC
-Duser.timezone=UTC+0800
-Dfile.encoding=UTF-8
-Djava.io.tmpdir=/tmp
-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
-Dderby.stream.error.file=/data1/druid/derby.log

 

  1. /data1/druid/druid-0.15.0/conf/druid/cluster/master/coordinator-overlord/runtime.properties
druid.service=druid/coordinator
druid.plaintextPort=9181

druid.coordinator.startDelay=PT10S
druid.coordinator.period=PT5S

# Run the overlord service in the coordinator process
druid.coordinator.asOverlord.enabled=true
druid.coordinator.asOverlord.overlordService=druid/overlord

druid.indexer.queue.startDelay=PT5S

druid.indexer.runner.type=remote
druid.indexer.storage.type=metadata

 

3.2 啟動命令
 nohup ./bin/start-cluster-master-no-zk-server >/dev/null 2>&1 &

4 query節點

進入query節點,修改common的druid.host選項;

4.1 broker

broker主要對外提供資料查詢服務,查詢資料時,讀取zookeeper上的元資料和Router,併合並查詢結果資料。

  1. /data1/druid/druid-0.15.0/conf/druid/cluster/query/broker/jvm.config
-server
-Xms12g
-Xmx12g
-XX:MaxDirectMemorySize=6g
-XX:+ExitOnOutOfMemoryError
-Duser.timezone=UTC+0800
-Dfile.encoding=UTF-8
-Djava.io.tmpdir=/tmp
-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager

 

  1. /data1/druid/druid-0.15.0/conf/druid/cluster/query/broker/runtime.properties
druid.service=druid/broker
druid.plaintextPort=8182

# HTTP server settings
druid.server.http.numThreads=60

# HTTP client settings
druid.broker.http.numConnections=50
druid.broker.http.maxQueuedBytes=10000000

# Processing threads and buffers
druid.processing.buffer.sizeBytes=500000000
druid.processing.numMergeBuffers=6
druid.processing.numThreads=1
druid.processing.tmpDir=/data1/druid/processing

# Query cache disabled -- push down caching and merging instead
druid.broker.cache.useCache=true
druid.broker.cache.populateCache=true

 

4.2 router

router顧名思義,主要是按照規則將查詢路由到各個Broker上。

  1. /data1/druid/druid-0.15.0/conf/druid/cluster/query/router/jvm.config
-server
-Xms1g
-Xmx1g
-XX:+UseG1GC
-XX:MaxDirectMemorySize=256m
-XX:+ExitOnOutOfMemoryError
-Duser.timezone=UTC+0800
-Dfile.encoding=UTF-8
-Djava.io.tmpdir=/tmp
-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager

 

  1. /data1/druid/druid-0.15.0/conf/druid/cluster/query/router/runtime.properties
druid.service=druid/router
druid.plaintextPort=8888

# HTTP proxy
druid.router.http.numConnections=50
druid.router.http.readTimeout=PT5M
druid.router.http.numMaxThreads=100
druid.server.http.numThreads=100

# Service discovery
druid.router.defaultBrokerServiceName=druid/broker
druid.router.coordinatorServiceName=druid/coordinator

# Management proxy to coordinator / overlord: required for unified web console.
druid.router.managementProxy.enabled=true

 

4.3 啟動命令
nohup ./bin/start-cluster-query-server >/dev/null 2>&1 &

5 總結

Druid作為OLAP的新秀,在實時入庫和預聚合上表現非常優秀,而且可以和Flink結合,作為flink的下游資料儲存點,是一個非常不錯的選擇,而且新版的特性開始支援SQL,相信在未來一定能得到大力推廣,下一期寫一下有關Druid的實時入庫操作。