Storm安裝部署與應用(1)
最近在使用Storm實時計算框架,總結一下學習到的知識。以下陳述純屬個人觀點,如有錯誤,請斧正。
關於Storm是做什麼的?Storm是一個流式實時計算框架。何為流式?簡單的說流水線模式,一個接一個的向下一個流轉。何為實時?關於實時,就是Storm能夠做到毫秒級甚至納秒級梳理一條資料(注:這裡的處理時間與業務邏輯和伺服器效能有關)。
能夠做到相當短的時間內處理一條資料。下面我介紹一下乾貨。
1、Storm的安裝部署(叢集)
a:首先第一步需要先安裝Zookeeper,首先先去Apache上下載zookeeper的安裝檔案。上傳到伺服器
#tar -zxvf zookeeper.x.xx.tar.gz
然後進入zookeeper的conf檔案下,將zoo_sample.cfg 修改成zoo.cfg
# cd zookeeper.x.xx/conf/
# cp zoo_sample.cfg zoo.cfg
然後修改zoo.cfg中配置(注:配置一臺機器,其他機器配置檔案相同,ps:除了myid檔案)
# vim zoo.cfg
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
#dataDir=/tmp/zookeeper
dataDir=/home/storm/zookeeper/data
# the port at which the clients will connect
clientPort=2181
server.1=xx.xx.xx.01:2888:3888
server.2= xx.xx.xx.02:2888:3888
server.3= xx.xx.xx.03:2888:3888
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
autopurge.purgeInterval=1
注:紅色字型部分需要做修改或新增dataDir=/home/storm/zookeeper/data這種目錄要手工建立並有讀寫許可權
autopurge.snapRetainCount=3、autopurge.purgeInterval=1這兩項是配置zookeeper自動刪除臨時檔案,只保留最新的三個
在dataDir目錄下建立myid檔案,server.x中其中x代表幾,就在myid中寫幾,例如xx.xx.xx.01代表1,myid中寫1,啟動zookeeper.
nohup ./bin/zkServer.sh start &
#jps
當出現QuorumPeerMain程序代表zookeeper啟動成功
2、Storm的安裝配置
多臺伺服器配置相同,配置好一臺複製到另外幾臺即可,我這裡是三臺。
a:下載Storm安裝檔案,上傳到伺服器,進行解壓。
# tar -zxvf apache-Storm-xx.xx.tar.gz
進入Storm的配置資料夾下,將storm.yaml進行備份# cd apache-storm-x.x.x/conf/
# cp storm.yaml storm_bak.yaml
# vim storm.yaml
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
########### These MUST be filled in for a storm configuration
storm.zookeeper.servers:
- "xx.xx.xx.01"
- "xx.xx.xx.02"
- "xx.xx.xx.03"
nimbus.host: "xx.xx.xx.01"
# nimbus.host: "nimbus"
#
storm.local.dir: "/home/storm_data"
supervisor.slots.ports:
- 6700
- 6701
- 6702
- 6703
# supervisor.childopts: "-Xmx1024m"
worker.childopts: "-Xmx2048m"
# topology.state.synchronization.timeout.secs: 60
topology.message.timeout.secs: 150
# topology.enable.message.timeouts: true
topology.max.spout.pending: 8000
# topology.ackers: 0
#
# ##### These may optionally be filled in:
#
## List of custom serializations
# topology.kryo.register:
# - org.mycompany.MyType
# - org.mycompany.MyType2: org.mycompany.MyType2Serializer
#
## List of custom kryo decorators
# topology.kryo.decorators:
# - org.mycompany.MyDecorator
#
## Locations of the drpc servers
# drpc.servers:
# - "server1"
# - "server2"
## Metrics Consumers
# topology.metrics.consumer.register:
# - class: "backtype.storm.metric.LoggingMetricsConsumer"
# parallelism.hint: 1
# - class: "org.mycompany.MyMetricsConsumer"
# parallelism.hint: 1
# argument:
# - endpoint: "metrics-collector.mycompany.org"
注:storm.zookeeper.servers:是配置的Zookeeper的地址
storm.local.dir: "/home/storm_data"目錄需要手工建立
supervisor.slots.ports:
- 6700
- 6701
- 6702
- 6703
固定配置,每臺機器最多啟動4個程序,他們的埠號
worker.childopts: "-Xmx2048m"每個程序虛擬機器記憶體
topology.message.timeout.secs:150訊息150秒沒有Act就認為失敗,然後重發
topology.max.spout.pending: 8000 spout限流,每個spout例項中的沒有act和失敗的最大待處理訊息條數。
啟動Storm,輸入以下命令:
# nohup storm nimbus > myout_numbus.file 2>&1 &
# nohup storm supervisor > myout_sup.file 2>&1 &
# nohup storm ui > myout_ui.file 2>&1 &
當看到以下幾個程序後即為安裝成功。