用cloudera manager安裝impala全過程以impala、hive、Spark效能比較-(三)cloudera manager 安裝impala成功並對impala、hive進行簡單測試
Cloudera manager安裝impala除了第一篇文章提到的條件:1.需要安裝centos6.2系統。2.CDH4.1.0以上版本3.要在叢集每個節點安裝hive。4.hive的元資料庫要使用mysql。5.每臺主機hosts檔案中都加入所有機器的IP地址和主機名的對應關係。還需要關閉ipv6.否則cloudera manager無法最終識別主機。
關閉ipv6後,登陸cloudera manager頁面顯示有三臺管理的主機。OK,cloudera
manager已經工作正常。點選‘服務’選項,選擇角色分配,為每一臺主機分配角色。Impala不在初始的服務內,等所有服務啟動正常後,需要再新增impala
$ time impala-shell - -impalad=200.200.200.11:21000 –q ‘select * from tt’
主機地址測試表
$ time hive –e ‘selcet * from tt’
進行比較時間。一個簡單的比較結果如下:
Impala
time impala-shell --impalad=200.200.200.11:21000 -q'select id from tt'
real 0m4.921s
user 0m0.072s
sys 0m0.042s
hive
time hive -e 'select id from tt'
Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties
Hive history file=/tmp/hdfs/hive_job_log_hdfs_201212111430_946199434.txt
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201212111359_0001, Tracking URL = http://big1-1:50030/jobdetails.jsp?jobid=job_201212111359_0001
Kill Command = /usr/lib/hadoop/bin/hadoop job -Dmapred.job.tracker=big1-1:8021 -kill job_201212111359_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2012-12-11 14:30:44,633 Stage-1 map = 0%, reduce = 0%
2012-12-11 14:30:49,716 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.92 sec
2012-12-11 14:30:50,735 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.92 sec
2012-12-11 14:30:51,746 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.92 sec
2012-12-11 14:30:52,761 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 0.92 sec
MapReduce Total cumulative CPU time: 920 msec
Ended Job = job_201212111359_0001
MapReduce Jobs Launched:
Job 0: Map: 1 Cumulative CPU: 0.92 sec HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 920 msec
OK
Time taken: 36.364 seconds
real 0m40.248s
user 0m15.590s
sys 0m2.638s
可以看出impala比hive快很多。
這只是一個初步認識,後面我們會用一些幾G的資料在hive,impala,spark上分別跑。做更詳盡的對比。以後有時間再寫。