hadoop偽分散式元件安裝
一、版本建議
Centos | V7.5 |
Java | V1.8 |
Hadoop | V2.7.6 |
Hive | V2.3.3 |
Mysql | V5.7 |
Spark | V2.3 |
Scala | V2.12.6 |
Flume | V1.80 |
Sqoop | V1.4.5 |
二、Hadoop
JDK地址:
http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
Hadoop地址:
http://hadoop.apache.org/releases.html
http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.7.6/hadoop-2.7.6.tar.gz
Hive地址:
http://www.apache.org/dyn/closer.cgi/hive/
http://ftp.jaist.ac.jp/pub/apache/hive/
Spark地址:
http://spark.apache.org/downloads.html
https://www.apache.org/dyn/closer.lua/spark/spark-2.3.1/spark-2.3.1-bin-hadoop2.7.tgz
Scala地址:
https://www.scala-lang.org/download/2.12.6.html
https://downloads.lightbend.com/scala/2.12.6/scala-2.12.6.msi
Flume地址:
http://www.apache.org/dyn/closer.lua/flume/1.8.0/apache-flume-1.8.0-bin.tar.gz
http://ftp.meisei-u.ac.jp/mirror/apache/dist/flume/1.8.0/apache-flume-1.8.0-bin.tar.gz
HBase地址:
http://archive.apache.org/dist/hbase/1.2.6/hbase-1.2.6-bin.tar.gz
sqoop地址:
http://archive.apache.org/dist/sqoop/1.4.5/sqoop-1.4.5.bin__hadoop-2.0.4-alpha.tar.gz
三、修改IP
#臨時ip設定
$ ifconfig eth0 192.168.116.100 netmask 255.255.255.0
$ vim /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
BOOTPROTO=static ###
##HWADDR=00:0C:29:3C:BF:E7
IPV6INIT=yes
NM_CONTROLLED=yes
ONBOOT=yes ###
TYPE=Ethernet
##UUID=ce22eeca-ecde-4536-8cc2-ef0dc36d4a8c
IPADDR=192.168.116.100 ###
NETMASK=255.255.255.0 ###
GATEWAY=192.168.116.2 ###
DNS1=219.141.136.10 ###
# 網絡卡重啟
$ service network restart
四、Centos基本元件安裝
$ yum install net-tools.x86_64 vim* wget.x86_64 ntp -y
五、修改主機名
$ vi /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=master
#修改hosts檔案
$ vi /etc/hosts
#新增新的一行內容(注意:ip為自己本機的ip地址,比如192.168.116.100)
192.168.116.100 master
六、關閉防火牆
$ systemctl stop firewalld.service #停止firewall
$ systemctl disable firewalld.service #禁止firewall開機啟動
$ service iptables status #檢視防火牆狀態
$ service iptables stop #關閉防火牆
$ chkconfig iptables --list #檢視防火牆開機啟動狀態
$ chkconfig iptables off #關閉防火牆開機啟動
七、ssh免密碼
$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys
# 驗證配置
$ ssh master
八、安裝JDK
1、解壓jdk
#建立資料夾
$ mkdir -p /home/hadoop/opt
#解壓
$ tar -zxvf jdk-8u181-linux-x64.tar.gz -C /home/hadoop/opt
2、將java新增到環境變數中
$ vim /etc/profile
#在檔案最後新增
export JAVA_HOME=/home/hadoop/opt/jdk1.8.0_181
export PATH=$PATH:$JAVA_HOME/bin
#重新整理配置
$ source /etc/profile
九、Hadoop元件安裝
1、解壓
# 建立軟體下載目錄
$ mkdir -p /home/hadoop/opt
$ cd /home/hadoop/opt
# 解壓
$ tar -zxvf hadoop-2.7.6.tar.gz -C /home/hadoop/opt
2、環境變數配置
$ vi /etc/profile
#在檔案最後新增
export HADOOP_HOME=/home/hadoop/opt/hadoop-2.7.6
export HADOOP_CONF_DIR=/home/hadoop/opt/hadoop-2.7.6/etc/hadoop
export YARN_CONF_DIR=/home/hadoop/opt/hadoop-2.7.6/etc/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
#重新整理配置
$ source /etc/profile
3、修改配置/home/hadoop/opt/hadoop-2.7.6/etc/hadoop/hadoop-env.sh
$ vi /home/hadoop/opt/hadoop-2.7.6/etc/hadoop/hadoop-env.sh
修改該行內容 export JAVA_HOME=/home/hadoop/opt/jdk1.8.0_181
4、修改配置檔案,在目錄/home/hadoop/opt/hadoop-2.7.6/etc/hadoop/下建立目錄
$ mkdir -p /home/hadoop/opt/hadoop-2.7.6/hdfs_tmp
$ mkdir -p /home/hadoop/opt/hadoop-2.7.6/hdfs/name
$ mkdir -p /home/hadoop/opt/hadoop-2.7.6/hdfs/data
$ cd /home/hadoop/opt/hadoop-2.7.6/etc/hadoop/
5、編輯配置資訊檔案
$ vi core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://master:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/home/hadoop/opt/hadoop-2.7.6/hdfs_tmp</value> </property> <property> <name>io.file.buffer.size</name> <value>4096</value> </property> </configuration>
$ vi hdfs-site.xml
<configuration> <property> <name>dfs.namenode.name.dir</name> <value>/home/hadoop/opt/hadoop-2.7.6/hdfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/home/hadoop/opt/hadoop-2.7.6/hdfs/data</value> </property> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>master:9001</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> </configuration>
$ cp mapred-site.xml.template mapred-site.xml
$ vi mapred-site.xml
configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address </name> <value>master:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>master:19888</value> </property> </configuration>
$ vi yarn-site.xml (yarn的配置安裝系統為4G記憶體配置)
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>master:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>master:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>master:8031</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>master:8033</value> </property> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <property> <name>yarn.log-aggregation.retain-seconds</name> <value>86400</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>master:8088</value> </property> <property> <name>yarn.log.server.url</name> <value>http://master:19888/jobhistory/logs</value> </property> <property> <name>yarn.scheduler.maximum-allocation-mb</name> <value>3072</value> </property> <property> <name>yarn.scheduler.minimum-allocation-mb</name> <value>1024</value> </property> <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>3072</value> </property> <property> <name>yarn.nodemanager.resource.cpu-vcores</name> <value>1</value> </property> <property> <name>mapreduce.map.memory.mb</name> <value>1024</value> </property> <property> <name>mapreduce.map.java.opts</name> <value>-Xmx819m</value> </property> <property> <name>mapreduce.reduce.memory.mb</name> <value>2048</value> </property> <property> <name>mapreduce.reduce.java.opts</name> <value>-Xmx1638m</value> </property> <property> <name>yarn.app.mapreduce.am.resource.mb</name> <value>2048</value> </property> <property> <name>yarn.app.mapreduce.am.command-opts</name> <value>-Xmx1638m</value> </property> <property> <name>mapreduce.task.io.sort.mb</name> <value>409</value> </property> <property> <name>mapreduce.job.ubertask.enable</name> <value>true</value> </property> <property> <name>yarn.nodemanager.pmem-check-enabled</name> <value>false</value> </property> <property> <name>yarn.nodemanager.vmem-check-enabled</name> <value>false</value> </property> </configuration>
$ vi slaves
master
6、Hadoop初始化並啟動叢集
# 初始化namenode datanode
$ cd /home/hadoop/opt/hadoop-2.7.6/
$ bin/hdfs namenode -format
# 啟動namenode datanode
$ sbin/start-dfs.sh
# 關閉namenode datanode
$ sbin/stop-dfs.sh
#啟動Yarn資源服務
$ sbin/start-yarn.sh
# 關閉 Yarn資源服務
$ sbin/stop-yarn.sh
# 測試hdfs 和 MapReduce
$ cd /home/hadoop/opt/hadoop-2.7.6/
$ bin/hdfs dfs -mkdir /input
$ bin/hdfs dfs -mkdir /test
$ bin/hdfs dfs -put etc/hadoop /input
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.6.jar grep /input_test /output6 'dfs[a-z.]+'
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.6.jar wordcount /input.txt /output3
# 查詢hdfs 檔案
方式一:
$ bin/hdfs dfs -get output output
$ cat output/*
方式二:
$ bin/hdfs dfs -cat output/*
7、網頁測試訪問
Namenode:http://192.168.116.100:50070/
ResourceManager : http://192.168.116.100:8088/
十、Hive元件安裝
1、安裝mysql
參照:https://www.cnblogs.com/wishwzp/p/7113403.html
2、解壓
$ tar -zxvf apache-hive-2.3.3-bin.tar.gz -C /home/hadoop/opt
3、環境變數配置
$ vi /etc/profile
#在檔案最後新增
export HIVE_HOME=/home/hadoop/opt/apache-hive-2.3.3-bin
export PATH=$PATH:$HIVE_HOME/bin
#重新整理配置
$ source /etc/profile
4、在$HIVE_HOME/conf目錄下編輯hive-site.xml
$ cd /home/hadoop/opt/apache-hive-2.3.3-bin/conf
$ cp hive-default.xml.template hive-site.xml
$ vi hive-site.xml
<configuration> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://192.168.116.100:3306/hive?createDatabaseIfNotExist=true&characterEncoding=UTF-8</value> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>root</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>root</value> </property> <property> <name>hive.metastore.uris</name> <value>thrift://master:9083</value> </property> <property> <name>hive.server2.thrift.bind.host</name> <value>192.168.116.100</value> </property> <property> <name>hive.server2.thrift.port</name> <value>10000</value> </property> </configuration>
$ vi hive-env.sh
#修改HADOOP_HOME
HADOOP_HOME=/home/hadoop/opt/hadoop-2.7.6
5、放入資料庫驅動
下載mysql-connector-java-5.1.39-bin.jar 包,複製放到/home/hadoop/opt/hive/lib目錄下就可以了
6、在hive2.0以後的版本,初始化hive指令
$ schematool -dbType mysql -initSchema
7、測試demo
導資料建立表
# hive_data.txt
1,test01,23,address01
2,test02,45,address02
3,test03,8,addresss01
$ hive
$ create table test(id string,name string ,addr string) row format delimited fields terminated by ',';
$ LOAD DATA LOCAL INPATH '/home/hadoop/opt/hive_data.txt' INTO TABLE test;
8、hive的遠端連線
#編輯hadoop的core-site.xml檔案
$ vi /home/hadoop/opt/hadoop-2.7.6/etc/hadoop/core-site.xml
#新增如下內容 <property> <name>hadoop.proxyuser.root.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.root.groups</name> <value>*</value> </property>
9、啟動服務
$ nohup hive --service metastore > metastore.log 2>&1 &
$ nohup hive --service hiveserver2 > hiveserver2.log 2>&1 &
#測試
$ beeline
beeline> !connect jdbc:hive2://localhost:10000 user pwd
sql> show databases;
十一、Hbase元件安裝
1、配置環境變數(解壓略)
$ vi /etc/profile
export HBASE_HOME=/home/hadoop/opt/hbase-1.2.6
export PATH=$HBASE_HOME/bin:$PATH
2、配置hbase-env.sh
$ cd /home/hadoop/opt/hbase-1.2.6/conf
$ vi hbase-env.sh
export JAVA_HOME=/home/hadoop/opt/jdk1.8.0_181
export HBASE_MANAGES_ZK=true
3、配置hbase-site.xml
$ cd /home/hadoop/opt/hbase-1.2.6/conf
$ vi hbase-site.xml
<property> <name>hbase.rootdir</name> <value>hdfs://master:9000/hbase</value> </property> <property> <name>hbase.master.info.port</name> <value>60010</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property>
4、啟動Hbase
$ cd /home/hadoop/opt/hbase-1.2.6/bin
$ ./start-hbase.sh
十二、Flume元件安裝
1、解壓
$ tar -zxvf apache-flume-1.8.0-bin.tar.gz
2、配置Flume環境變數
$ vi /etc/profile
export FLUME_HOME=/home/hadoop/opt/apache-flume-1.8.0-bin
export PATH=$PATH:$FLUME_HOME/bin
3、Flume配置檔案修改
$ cd /home/hadoop/opt/apache-flume-1.8.0-bin/conf
$ cp flume-env.sh.template flume-env.sh
$ cp flume-conf.properties.template flume-conf.properties
$ vi flume-env.sh
export JAVA_HOME=/home/hadoop/opt/jdk1.8.0_181
# 驗證
$ flume-ng version
4、案例一
1)、增加配置檔案example.conf
$ yum install telnet-server.x86_64 -y
$ yum -y install xinetd telnet telnet-server
$ mkdir -p /home/hadoop/opt/testdata
$ cd /home/hadoop/opt/testdata
$ vi example.conf
# example.conf: A single-node Flume configuration
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = netcat
# a1.sources.r1.bind = 192.168.116.100
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444
# Describe the sink
a1.sinks.k1.type = logger
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
2)、啟動服務
$ flume-ng agent -c /home/hadoop/opt/apache-flume-1.8.0-bin/conf -f /home/hadoop/opt/testdata/example.conf -n a1 -Dflume.root.logger=INFO,console
3)、Client傳送Message
$ telnet localhost 44444
五、案例二
1)、準備資料檔案
$ mkdir -p /home/hadoop/opt/testdata/avro
$ cd /home/hadoop/opt/testdata/avro
$ vi avro_data.txt
1,test01,23,address01
2,test02,45,address02
3,test03,8,addresss01
2)、spool1.conf
$ cd /home/hadoop/opt/testdata
$ vi spool1.conf
# Name the components on this agent
#agent名, source、channel、sink的名稱
a1.sources = r1
a1.channels = c1
a1.sinks = k1
##具體定義source
a1.sources.r1.type = spooldir
a1.sources.r1.spoolDir = /home/hadoop/opt/testdata/avro
#具體定義channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 10000
a1.channels.c1.transactionCapacity = 100
#具體定義sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://master:9000/flume/%Y%m%d
a1.sinks.k1.hdfs.filePrefix = events-
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.useLocalTimeStamp = true
#不按照條數生成檔案
a1.sinks.k1.hdfs.rollCount = 0
#HDFS上的檔案達到128M時生成一個檔案
a1.sinks.k1.hdfs.rollSize = 134217728
#HDFS上的檔案達到60秒生成一個檔案
a1.sinks.k1.hdfs.rollInterval = 60
#組裝source、channel、sink
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
3)、demo執行指令
# 啟動服務
$ flume-ng agent -c /home/hadoop/opt/apache-flume-1.8.0-bin/conf -f /home/hadoop/opt/testdata/spool1.conf -n a1
# 新開一個視窗,傳輸資料
$ cp /home/hadoop/opt/testdata/avro/avro_data.txt.COMPLETED /home/hadoop/opt/testdata/avro/avro_data04.txt
十三、Sqoop元件安裝
1、解壓
$ tar -zxvf sqoop-1.4.5.bin__hadoop-2.0.4-alpha.tar.gz
2、配置環境變數
$ mv sqoop-1.4.5.bin__hadoop-2.0.4-alpha sqoop-1.4.5
$ vim /etc/profile
export SQOOP_HOME=/home/hadoop/opt/sqoop-1.4.5
export PATH=$PATH:$SQOOP_HOME/bin
$ cd /home/hadoop/opt/sqoop-1.4.5/conf
$ cp sqoop-env-template.sh sqoop-env.sh
$ vi sqoop-env.sh
export HADOOP_COMMON_HOME=/home/hadoop/opt/hadoop-2.7.6
export HADOOP_MAPRED_HOME=/home/hadoop/opt/hadoop-2.7.6
export HIVE_HOME=/home/hadoop/opt/apache-hive-2.3.3-bin
export HBASE_HOME=/home/hadoop/opt/hbase-1.2.6
3、拷貝mysql驅動包到/home/hadoop/opt/sqoop-1.4.5/lib下
4、進入Mysql資料庫建立表
Create database test;
Use test;
create table data(id varchar(32),name varchar(32),addr varchar(32));
insert into data(id,name,addr) values('test01','23','address01');
insert into data(id,name,addr) values('test02','45','address02');
insert into data(id,name,addr) values('test03','8','address01');
5、匯入匯出指令
1)、複製sqoop-1.4.5.jar到lib目錄下
$ cd /home/hadoop/opt/sqoop-1.4.5
$ cp sqoop-1.4.5.jar lib/
2)、執行指令
#匯入命令
$ sqoop import --connect jdbc:mysql://192.168.116.100:3306/test?characterEncoding=utf-8 --username root --password '123456' --table data --hive-import --create-hive-table --hive-table hivetest --fields-terminated-by ',' -m 1 --hive-overwrite
#檢視匯入hive的資料
$ hdfs dfs -cat /user/hive/warehouse/hivetest/part-m-00000
#匯出命令
$ sqoop export --connect jdbc:mysql://192.168.116.100:3306/test --username root --password '123456' --table dataFromHDFS --export-dir /user/hive/warehouse/hivetest/part-m-00000 --input-fields-terminated-by ','
十四、Scala安裝
1. 解壓
$ cd /home/hadoop/opt
$ tar -zxvf scala-2.12.6.tgz
2. 配置環境變數
$ vi /etc/profile
export SCALA_HOME=/home/hadoop/opt/scala-2.12.6
export PATH=$SCALA_HOME/bin:$SPARK_HOME/bin:$PATH
$ source /etc/profile
十五、 Spark元件安裝
1、修改$SPARK_HOME/conf/spark-env.sh(解壓略)
$ cd /home/hadoop/opt/spark-2.3.1-bin-hadoop2.7/conf
$ cp spark-env.sh.template spark-env.sh
$ hdfs dfs -mkdir -p /spark/historyLog
$ vi spark-env.sh
export SPARK_MASTER_IP=master
export SPARK_MASTER_PORT=7077
export JAVA_HOME=/home/hadoop/opt/jdk1.8.0_181
export HADOOP_HOME=/home/hadoop/opt/hadoop-2.7.6
export SCALA_HOME=/home/hadoop/opt/scala-2.12.6
export SPARK_HOME=/home/hadoop/opt/spark-2.3.1-bin-hadoop2.7
export HADOOP_CONF_DIR=/home/hadoop/opt/hadoop-2.7.6/etc/hadoop
export YARN_CONF_DIR=/home/hadoop/opt/hadoop-2.7.6/etc/hadoop
export SPARK_WORKER_MEMORY=1G
export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=18080 -Dspark.history.retainedApplications=100 -Dspark.history.fs.logDirectory=hdfs://master:9000/spark/historyLog"
2、修改$SPARK_HOME/conf/spark-defaults.conf
$ cp spark-defaults.conf.template spark-defaults.conf
$ vi spark-defaults.conf
spark.master yarn
spark.deploy.mode cluster
spark.yarn.historyServer.address master:18080
spark.history.ui.port 18080
spark.eventLog.enabled true
spark.eventLog.dir hdfs://master:9000/spark/historyLog
spark.history.fs.logDirectory hdfs://master:9000/spark/historyLog
spark.eventLog.compress true
spark.executor.instances 1
spark.worker.cores 1
spark.worker.memory 1G
spark.eventLog.enabled true
spark.serializer org.apache.spark.serializer.KryoSerializer
3、啟動spark
$ cd /home/hadoop/opt/spark-2.3.1-bin-hadoop2.7/sbin
$ ./start-all.sh
$ cd /home/hadoop/opt/spark-2.3.1-bin-hadoop2.7/
$ bin/spark-submit --master spark://master:7077 --deploy-mode client --class org.apache.spark.examples.SparkPi examples/jars/spark-examples_2.11-2.3.1.jar 100