大資料-hadoop 安裝 spark前奏
單節點安裝
開發Hadoop需要的基本軟體
vmware
vmware安裝ubuntu 12虛擬機器配置:
開啟root使用者:
sudo -s
sudo passwd root
詳細參考:
http://blog.csdn.net/flash8627/article/details/44729077
安裝vsftpd:
[email protected]:/usr/lib/java# apt-getinstall vsftpd
配置vsftpd.conf即可使用本機帳戶登陸
[email protected]:/usr/lib/java# cp/etc/vsftpd.conf /etc/vsftpd.conf.bak
詳細資訊網上很多,不多說了.
Java 1.7
上傳至伺服器後解壓,設定環境變數即可,環境變數具體引數如下:
[email protected]:/usr/lib/java# tar -zxvfjdk-7u80-linux-x64.tar.gz
[email protected]:/usr/lib/java# mv jdk1.7.0_80/usr/lib/java/jdk1.7
[email protected]:/usr/lib/java# vim/root/.bashrc
export JAVA_HOME=/usr/lib/java/jdk1.7
export JRE_HOME=${JAVA_HOME}/jre
exportCLASS_PATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:/usr/local/hadoop/hadoop-2.6.0/bin:$PATH
安裝ssh
設定ssh免密碼登陸
[email protected]:/usr/lib/java# ssh-keygen -trsa -P ""
Generating public/private rsa key pair.
Enter file in which to save the key(/root/.ssh/id_rsa):
Created directory '/root/.ssh'.
Your identification has been saved in/root/.ssh/id_rsa.
Your public key has been saved in/root/.ssh/id_rsa.pub.
The key fingerprint is:
d3:bb:1e:df:10:09:ed:62:78:43:66:9f:8f:6a:b0:[email protected]
The key's randomart image is:
+--[ RSA 2048]----+
| |
| . |
| = . |
| * + o |
| S * * |
| .+ + + |
| oo o . |
| . o= o |
| =E . . |
+-----------------+
[email protected]:/usr/lib/java# ls /root/.ssh/
id_rsa id_rsa.pub
[email protected]:/usr/lib/java# cat/root/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
[email protected]:/usr/lib/java# ls /root/.ssh/
authorized_keys id_rsa id_rsa.pub
安裝rsync
[email protected]:/usr/lib/java#apt-get install rsync
hadoop 2.6
解壓hadoop
tar -zxvf /home/ftp/hadoop-2.6.0
配置hadoop-env.sh
cd /usr/local/hadoop/hadoop-2.6.0/etc/hadoop/
vim hadoop-env.sh
# export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=/usr/lib/java/jdk1.7
配置hadoop環境變數,檔案相對於使用者目錄下.bashrc
cat ~/.bashrc
export JAVA_HOME=/usr/lib/java/jdk1.7
export JRE_HOME=${JAVA_HOME}/jre
exportCLASS_PATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:/usr/local/hadoop/hadoop-2.6.0/bin:$PATH
驗證環境變數:hadoopversion
執行wordcount
mkdir input
[email protected]:/usr/local/hadoop/hadoop-2.6.0#cp README.txt input
[email protected]:/usr/local/hadoop/hadoop-2.6.0#hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcountinput output
[email protected]:/usr/local/hadoop/hadoop-2.6.0# cat output/*配置Hadoop單機模式並執行Wordcount示例
主要涉及以下配置資訊:修改hadoop核心配置檔案core-site.xml,主要是配置hdfs的地址和埠號.修改hadoop中hdfs的配置檔案hdfs-site.xml,主要是配置replication.修改hadoop的MapReduce的配置檔案mapred-site.xml,主要是配置JobTracker的地址和埠.檔案所在的目錄:/usr/local/hadoop/hadoop-2.6.0/etc/hadoop
core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/tmp</value>
</property>
</configuration>
vim hdfs-site.xml
<property>
<name>dfs.replication></name>
<value>1</value></property>
<property>
<name>dfs.name.dir</name>
<value>/usr/local/hadoop/hdfs/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/usr/local/hadoop/hdfs/data</value>
</property>
[email protected]:/usr/local/hadoop/hadoop-2.6.0/etc/hadoop#cp mapred-site.xml.template mapred-site.xml
[email protected]:/usr/local/hadoop/hadoop-2.6.0/etc/hadoop#vim mapred-site.xml
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
接下來進行namenode格式化:
hadoop namenode -format
第二次格式化需要輸入Y完成格式化過程
啟動hadoop:start-all.sh
[email protected]:/usr/local/hadoop/hadoop-2.6.0/etc/hadoop#../../sbin/start-all.sh
This script is Deprecated. Instead usestart-dfs.sh and start-yarn.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to/usr/local/hadoop/hadoop-2.6.0/logs/hadoop-root-namenode-ubuntu.out
localhost: starting datanode, logging to/usr/local/hadoop/hadoop-2.6.0/logs/hadoop-root-datanode-ubuntu.out
Starting secondary namenodes [0.0.0.0]
The authenticity of host '0.0.0.0(0.0.0.0)' can't be established.
ECDSA key fingerprint is81:a2:0b:4d:95:43:c7:3f:84:f1:a4:d4:24:30:53:bf.
Are you sure you want to continueconnecting (yes/no)? yes
0.0.0.0: Warning: Permanently added'0.0.0.0' (ECDSA) to the list of known hosts.
0.0.0.0: starting secondarynamenode,logging to /usr/local/hadoop/hadoop-2.6.0/logs/hadoop-root-secondarynamenode-ubuntu.out
starting yarn daemons
starting resourcemanager, logging to/usr/local/hadoop/hadoop-2.6.0/logs/yarn-root-resourcemanager-ubuntu.out
localhost: starting nodemanager, logging to/usr/local/hadoop/hadoop-2.6.0/logs/yarn-root-nodemanager-ubuntu.out
檢視hadoop執行程序jps
[email protected]:/usr/local/hadoop/hadoop-2.6.0/etc/hadoop#jps
4300 NodeManager
4085 ResourceManager
4510 Jps
3951 SecondaryNameNode
3652 DataNode
3443 NameNode
叢集監控檢視:
http://localhost:50070/dfshealth.jsp
或用新的UI: http://192.168.222.143:50070/dfshealth.html#tab-overview
在hdfs上建目錄:
hadoop fs -mkdir /input
上傳檔案:
hadoop fs -copyFromLocal /usr/local/hadoop/hadoop-2.6.0/etc/hadoop/* /input至此偽叢集完成.
如有需要可進QQ群[大資料交流 208881891]詢問.
叢集安裝
1./etc/hostname修改主機名並在/etc/hosts中配置主機名和IP的對映關係
主要修改主機名:/etc/hostname
配置對映關係:/etc/hosts
192.168.222.143 Master
192.168.222.144 Slave1
192.168.222.145 Slave2
配置ssh無密碼登陸ssh-keygen -t rsa -P ""
scp id_rsa.pub Slave1:/root/.ssh/Master.pub 遠端拷貝
cat id_rsa.pub >>authorized_keys
修改hadoop配置:
把先前的localhost改成Master
具體配置如下:
core-site.xml
<configuration> <property> <name>fs.default.name</name> <value>hdfs://Master:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/usr/local/hadoop/hadoop-2.6.0/tmp</value> </property> </configuration> |
hdfs-site.xml
<property> <name>dfs.replication></name> <value>3</value> </property> <property> <name>dfs.name.dir</name> <value>/usr/local/hadoop/hdfs/name</value> </property> <property> <name>dfs.data.dir</name> <value>/usr/local/hadoop/hdfs/data</value> </property> |
mapred-site.xml
<property> <name>mapred.job.tracker</name> <value>Master:9001</value> </property> |
slaves
Master Slave1 Slave2 |
將java和hadoop拷貝到遠端節點:
[email protected]:/usr/lib/java#
scp -r jdk1.7 Slave1:/usr/lib/java/
scp -r hadoop-2.6.0 Slave1:/usr/local/hadoop/
拷貝完成後修改slave的環境配置
export JAVA_HOME=/usr/lib/java/jdk1.7 export JRE_HOME=${JAVA_HOME}/jre export CLASS_PATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib export PATH=${JAVA_HOME}/bin:/usr/local/hadoop/hadoop-2.6.0/bin:$PATH |
先清理hdfs/name和data, tmp目錄
格式化叢集:hadoop namenode -format
啟動叢集:
[email protected]:/usr/local/hadoop/hadoop-2.6.0/sbin# ./start-all.sh
Thisscript is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Startingnamenodes on [Master]
Theauthenticity of host 'master (192.168.222.143)' can't be established.
ECDSAkey fingerprint is 81:a2:0b:4d:95:43:c7:3f:84:f1:a4:d4:24:30:53:bf.
Areyou sure you want to continue connecting (yes/no)? yes
Master:Warning: Permanently added 'master,192.168.222.143' (ECDSA) to the list ofknown hosts.
Master:starting namenode, logging to/usr/local/hadoop/hadoop-2.6.0/logs/hadoop-root-namenode-Master.out
Master:starting datanode, logging to /usr/local/hadoop/hadoop-2.6.0/logs/hadoop-root-datanode-Master.out
Slave2:starting datanode, logging to/usr/local/hadoop/hadoop-2.6.0/logs/hadoop-root-datanode-Slave2.out
Slave1:starting datanode, logging to/usr/local/hadoop/hadoop-2.6.0/logs/hadoop-root-datanode-Slave1.out
Startingsecondary namenodes [0.0.0.0]
0.0.0.0:starting secondarynamenode, logging to/usr/local/hadoop/hadoop-2.6.0/logs/hadoop-root-secondarynamenode-Master.out
startingyarn daemons
startingresourcemanager, logging to /usr/local/hadoop/hadoop-2.6.0/logs/yarn-root-resourcemanager-Master.out
Slave1:starting nodemanager, logging to/usr/local/hadoop/hadoop-2.6.0/logs/yarn-root-nodemanager-Slave1.out
Master:starting nodemanager, logging to/usr/local/hadoop/hadoop-2.6.0/logs/yarn-root-nodemanager-Master.out
Slave2:starting nodemanager, logging to/usr/local/hadoop/hadoop-2.6.0/logs/yarn-root-nodemanager-Slave2.out
[email protected]:/usr/local/hadoop/hadoop-2.6.0/sbin# jps
2912 DataNode
3182 SecondaryNameNode
3557 NodeManager
3855 Jps
3342 ResourceManager
2699 NameNode
[email protected]:/usr/local/hadoop/hadoop-2.6.0/sbin# hadoop dfsadmin-report
DEPRECATED: Use of this script toexecute hdfs command is deprecated.
Instead use the hdfs command for it.
Configured Capacity: 56254304256(52.39 GB)
Present Capacity: 48346591232 (45.03GB)
DFS Remaining: 48346517504 (45.03 GB)
DFS Used: 73728 (72 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Live datanodes (3):
Name: 192.168.222.143:50010 (Master)
Hostname: Master
Decommission Status : Normal
Configured Capacity: 18751434752(17.46 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 2651889664 (2.47 GB)
DFS Remaining: 16099520512 (14.99 GB)
DFS Used%: 0.00%
DFS Remaining%: 85.86%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Sat Jun 11 10:51:41 CST2016
Name: 192.168.222.144:50010 (Slave1)
Hostname: Slave1
Decommission Status : Normal
Configured Capacity: 18751434752(17.46 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 2653249536 (2.47 GB)
DFS Remaining: 16098160640 (14.99 GB)
DFS Used%: 0.00%
DFS Remaining%: 85.85%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Sat Jun 11 10:51:41 CST2016
Name: 192.168.222.145:50010 (Slave2)
Hostname: Slave2
Decommission Status : Normal
Configured Capacity: 18751434752(17.46 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 2602573824 (2.42 GB)
DFS Remaining: 16148836352 (15.04 GB)
DFS Used%: 0.00%
DFS Remaining%: 86.12%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Sat Jun 11 10:51:42 CST 2016[email protected]:/usr/local/hadoop/hadoop-2.6.0/sbin#./stop-all.sh
This script is Deprecated. Insteaduse stop-dfs.sh and stop-yarn.sh
Stopping namenodes on [Master]
Master: stopping namenode
Master: stopping datanode
Slave1: stopping datanode
Slave2: stopping datanode
Stopping secondary namenodes[0.0.0.0]
0.0.0.0: stopping secondarynamenode
stopping yarn daemons
stopping resourcemanager
Slave1: stopping nodemanager
Master: stopping nodemanager
Slave2: stopping nodemanager
Slave1: nodemanager did not stopgracefully after 5 seconds: killing with kill -9
Slave2: nodemanager did not stopgracefully after 5 seconds: killing with kill -9
no proxyserver to stop
下一篇:在此基礎上 spark叢集搭建啥情況都可以進群討論.
QQ群:大資料交流 208881891
相關推薦
大資料-hadoop 安裝 spark前奏
單節點安裝 開發Hadoop需要的基本軟體 vmware vmware安裝ubuntu 12虛擬機器配置: 開啟root使用者: sudo -s sudo passwd root 詳細參考: http://blog.csdn.net/flash8627/artic
學習筆記:從0開始學習大資料-2.hadoop安裝
在膝上型電腦安裝學習環境,採用all in one的偽分散式,所有都在一臺電腦部署。 1.下載 hadoop wget http://archive-primary.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.16.0.tar.gz 2. 解壓&n
docker部署分散式大資料叢集hadoop、spark、hive、jdk、scala、
(一)1 用docker建立映象並搭建三個節點容器的hadoop及spark服務 包括:mysql,hadoop,jdk,spark,hive,scala,sqoop docker已經安裝並且啟動 #搜尋centos映象: docker search centos #拉取
大資料之hadoop對比spark------資料儲存
1.Hadoop的資料都是來自於哪裡: 答案:磁碟。 2.map與reduce可以同時執行嗎? 答案:不能,由什麼決定的,shuffle過程決定的。 3.spark為什麼比hadoop要快,sprak儘量的避免從磁碟中進行讀取,以及配置資訊和計算資料,對比這些特性,極
首頁 Hadoop Spark Hive Kafka Flume 大資料平臺 Kylin 專題文章 Spark運算元 一起學Hive Hive儲存過程 Hive分析函式 Spark On Yarn 資料
關鍵字: orc、index、row group index、bloom filter index之前的文章《更高的壓縮比,更好的效能–使用ORC檔案格式優化Hive》中介紹了Hive的ORC檔案格式,它不但有著很高的壓縮比,節省儲存和計算資源之外,還通過一個內建的輕量級索引
大資料入門Hadoop安裝
解壓hadoop tar -zxvf h -C app/ hadoop檔案目錄結構解析: bin:可執行指令碼 sbin:系統指令碼,啟動停止hadoop的指令碼 etc:hadoop的配置檔案 lib:hadoop的本地庫 include:本地庫包含檔案 sh
【大資料】Hadoop初上陣(安裝真是坑坑坑)
剛剛裝完了偽分散式的Hadoop,來寫一下感謝,並記錄一下遇到的bug。 我是按著我們學校的發的大資料教程敲的,但是遇到一大堆bug,估計是選錯hadoop版本了,書上寫的用hadoop-2.7.3而我用的是hadoop-2.7.5,反正總結一下吧 首先用的是Ubuntu系統(也就是Linu
大資料,hadoop,spark,hive,ZooKeeper,kafka,flume等元件環境搭建
大資料環境搭建 1、Virtual Box 4.1 2、CentOS 6.5 3、JDK 1.7 4、SecureCRT(自己下載) 5、WinSCP(自己下載) 6、Hadoop 2.4.1 7、Hive 0.13 8、ZooKeeper 3.
hadoop大資料04---hive 的安裝配置
Centos 7 192.168.2.37 計算機名稱 master 192.168.2.38 計算機名稱 slave1 192.168.2.39 計算機名稱 slave2 hive 要使用到mysql ,所以先安裝mysql , hive 和mysql 都安裝在
大資料之hadoop單機版虛擬機器Vmware安裝教程
為深入學習hadoop,需要在個人電腦中安裝cloudera_centos虛擬機器。本篇文件介紹的就是關於cloudera_centos虛擬機器的安裝教程。(推薦使用virtualbox
【大資料】CentOS6.5安裝mysql5.6(靠譜!)
一、吐槽 我明明是跟著老師寫的書上一步一步來的,到最後出現了一堆錯誤,然後從網上找教程,網上那些人不知道咋想的,啥也往上貼,隨便一篇,除了自己能看懂沒幾個人能看懂的文章就貼到網上了。 &n
大資料學習之路106-spark streaming統計結果寫入mysql
我們首先將資料庫的配置資訊寫到配置檔案中。 要使用配置檔案的話,首先我們要在pom檔案中匯入配置檔案讀取依賴: <dependency> <groupId>com.typesafe</groupId>
R語言大資料分析工具的安裝與應用
實驗名稱 R語言大資料分析工具的安裝與應用 專 業 軟體工程 姓 名 學
大資料之Hadoop學習(環境配置)——Hadoop偽分散式叢集搭建
title: Hadoop偽分散式叢集搭建 date: 2018-11-14 15:17:20 tags: Hadoop categories: 大資料 點選檢視我的部落格: Josonlee’s Blog 文章目錄 前言準備 偽分
大資料環境---hbase的安裝
前面已經搭建好了zookeeper主機叢集,hadoop叢集。 現子看來,zookeeper貌似提供了一種簡便的方法來解決hadoop叢集的問題,比如免密登陸,時間同步等。 但是二者都是分散式架構解決方案,所以應該不存在耦合關係! 
資料分析師眼中的大資料和Hadoop
一、前言 大資料這個概念不用我提大家也聽過很多了,前幾年各種公開論壇、會議等場合言必及大資料,說出來顯得很時髦似的。有意思的是最近擁有這個待遇的名詞是“人工智慧/AI”,當然這是後話。 眾所周知,大資料的發展是來源於Google三駕馬車,分別是: Google Fil
大資料 : Hadoop reduce階段
Mapreduce中由於sort的存在,MapTask和ReduceTask直接是工作流的架構。而不是資料流的架構。在MapTask尚未結束,其輸出結果尚未排序及合併前,ReduceTask是又有資料輸入的,因此即使ReduceTask已經建立也只能睡眠等待MapTask完成。從而可以從MapTask節點獲取
大資料平臺--Hadoop原生搭建教程
環境準備: 三臺虛擬機器 master(8)、slave1(9)、slave2(10) centos 7.1、jdk-8u171-linux-x64.tar.gz、hadoop-2.7.3.tar.gz 0x1環境準備 首先先在三臺虛擬機器中建立hadoop資料夾 mdkir /
大資料基礎Hadoop 2.x入門
hadoop概述 儲存和分析網路資料 三大元件 MapReduce 對海量資料的處理 思想: 分而治之 每個資料集進行邏輯業務處理map 合併統計資料結果reduce
ambari大資料平臺搭建的安裝(全)
本篇主要說明離線安裝的流程,如需檢視線上安裝的可以看以前博文 https://blog.csdn.net/xiaozou_it/article/details/82911160 一、安裝前的一些準備(離、線上皆需先完成) 1、推薦四臺虛擬機器器(本文以centos為例) 2、