hadoop distcp 同步叢集資料
命令1:hadoop distcp -update -delete -p hdfs://Smaster:9000//newexchange hdfs://Sslave0:9000/newexchange
命令2:hadoop distcp -update -delete -p webhdfs://192.168.88.22:50070/mysql_datas webhdfs://Sslave0:50070/mysql_datas
hadoop distcp 源資料 複製到備份伺服器
-update -delete -p 刪除本地的可以不加
命令1 別介紹是同版本複製同步,命令2 不同hadoop版本複製同步 但命令1 我在hadoop3.0.3 獲取 hadoop2.6.0資料可以用
[email protected] mapreduce]$ hdfs dfs -ls / #hadoop-3.0.3
Found 3 items
drwxr-xr-x - hadoop supergroup 0 2018-08-03 15:42 /hive2.3
drwxrwxrwx - hadoop supergroup 0 2018-08-03 16:09 /tmp
drwxr-xr-x - hadoop supergroup 0 2018-08-03 13:54 /user
[[email protected] mapreduce]$ hdfs dfs -ls hdfs://Smaster:9000/ #hadoop-2.6.0
Found 10 items
drwxr-xr-x - hadoop supergroup 0 2018-08-02 16:22 hdfs://Smaster:9000/check
drwxr-xr-x - hadoop supergroup 0 2018-08-02 17:40 hdfs://Smaster:9000/dsqdata
drwxrwxrwx - hadoop supergroup 0 2018-08-02 16:05 hdfs://Smaster:9000/hive
drwxr-xr-x - hadoop supergroup 0 2018-08-03 15:55 hdfs://Smaster:9000/mysql_datas
drwxr-xr-x - hadoop supergroup 0 2018-08-03 11:08 hdfs://Smaster:9000/newexchange
drwxr-xr-x - hadoop supergroup 0 2018-08-02 16:25 hdfs://Smaster:9000/spark_jars
drwxr-xr-x - hadoop supergroup 0 2018-08-02 16:25 hdfs://Smaster:9000/test
drwxrwxrwx - hadoop supergroup 0 2018-08-03 10:43 hdfs://Smaster:9000/tmp
drwxr-xr-x - hadoop supergroup 0 2018-08-02 16:11 hdfs://Smaster:9000/user
drwxr-xr-x - hadoop supergroup 0 2018-08-02 16:25 hdfs://Smaster:9000/ziang_test
[[email protected] ~]$ hadoop distcp webhdfs://Smaster:50070/mysql_datas webhdfs://Sslave0:50070/mysql_datas
2018-08-03 16:23:36,464 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, overwrite=false, append=false, useDiff=false, useRdiff=false, fromSnapshot=null, toSnapshot=null, skipCRC=false, blocking=true, numListstatusThreads=0, maxMaps=20, mapBandwidth=0.0, copyStrategy='uniformsize', preserveStatus=[BLOCKSIZE], atomicWorkPath=null, logPath=null, sourceFileListing=null, sourcePaths=[webhdfs://192.168.88.22:50070/mysql_datas], targetPath=webhdfs://Sslave0:50070/mysql_datas, filtersFile='null', blocksPerChunk=0, copyBufferSize=8192, verboseLog=false}, sourcePaths=[webhdfs://192.168.88.22:50070/mysql_datas], targetPathExists=false, preserveRawXattrsfalse
2018-08-03 16:23:36,627 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
2018-08-03 16:23:38,310 INFO tools.SimpleCopyListing: Paths (files+dirs) cnt = 26; dirCnt = 1
2018-08-03 16:23:38,310 INFO tools.SimpleCopyListing: Build file listing completed.
2018-08-03 16:23:38,312 INFO Configuration.deprecation: io.sort.mb is deprecated. Instead, use mapreduce.task.io.sort.mb
2018-08-03 16:23:38,312 INFO Configuration.deprecation: io.sort.factor is deprecated. Instead, use mapreduce.task.io.sort.factor
2018-08-03 16:23:38,468 INFO tools.DistCp: Number of paths in the copy list: 26
2018-08-03 16:23:38,546 INFO tools.DistCp: Number of paths in the copy list: 26
2018-08-03 16:23:38,567 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
2018-08-03 16:23:38,693 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/hadoop/.staging/job_1533283659094_0006
2018-08-03 16:23:38,999 INFO mapreduce.JobSubmitter: number of splits:5
2018-08-03 16:23:39,064 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
2018-08-03 16:23:39,265 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1533283659094_0006
2018-08-03 16:23:39,267 INFO mapreduce.JobSubmitter: Executing with tokens: []
2018-08-03 16:23:39,534 INFO conf.Configuration: resource-types.xml not found
2018-08-03 16:23:39,534 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2018-08-03 16:23:39,642 INFO impl.YarnClientImpl: Submitted application application_1533283659094_0006
2018-08-03 16:23:39,721 INFO mapreduce.Job: The url to track the job: http://Sslave0:8088/proxy/application_1533283659094_0006/
2018-08-03 16:23:39,721 INFO tools.DistCp: DistCp job-id: job_1533283659094_0006
2018-08-03 16:23:39,723 INFO mapreduce.Job: Running job: job_1533283659094_0006
2018-08-03 16:23:48,934 INFO mapreduce.Job: Job job_1533283659094_0006 running in uber mode : false
2018-08-03 16:23:48,936 INFO mapreduce.Job: map 0% reduce 0%
2018-08-03 16:24:07,254 INFO mapreduce.Job: map 20% reduce 0%
2018-08-03 16:24:10,281 INFO mapreduce.Job: map 40% reduce 0%
2018-08-03 16:24:21,376 INFO mapreduce.Job: map 60% reduce 0%
2018-08-03 16:24:22,383 INFO mapreduce.Job: map 80% reduce 0%
2018-08-03 16:24:29,452 INFO mapreduce.Job: map 100% reduce 0%
2018-08-03 16:24:30,476 INFO mapreduce.Job: Job job_1533283659094_0006 completed successfully
2018-08-03 16:24:30,994 INFO mapreduce.Job: Counters: 42
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=1036715
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=6315
HDFS: Number of bytes written=0
HDFS: Number of read operations=40
HDFS: Number of large read operations=0
HDFS: Number of write operations=10
WEBHDFS: Number of bytes read=9365494
WEBHDFS: Number of bytes written=9365494
WEBHDFS: Number of read operations=232
WEBHDFS: Number of large read operations=0
WEBHDFS: Number of write operations=76
Job Counters
Launched map tasks=5
Other local map tasks=5
Total time spent by all maps in occupied slots (ms)=127246
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=63623
Total vcore-milliseconds taken by all map tasks=63623
Total megabyte-milliseconds taken by all map tasks=130299904
Map-Reduce Framework
Map input records=26
Map output records=0
Input split bytes=685
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=907
CPU time spent (ms)=12660
Physical memory (bytes) snapshot=1198813184
Virtual memory (bytes) snapshot=16667459584
Total committed heap usage (bytes)=822607872
Peak Map Physical memory (bytes)=252973056
Peak Map Virtual memory (bytes)=3337940992
File Input Format Counters
Bytes Read=5630
File Output Format Counters
Bytes Written=0
DistCp Counters
Bandwidth in Btyes=5871336
Bytes Copied=9365494
Bytes Expected=9365494
Files Copied=25
DIR_COPY=1
[[email protected] ~]$ hadoop distcp -update -delete -p webhdfs://Smaster:50070/mysql_datas webhdfs://Sslave0:50070/mysql_datas
2018-08-03 16:27:25,141 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=true, deleteMissing=true, ignoreFailures=false, overwrite=false, append=false, useDiff=false, useRdiff=false, fromSnapshot=null, toSnapshot=null, skipCRC=false, blocking=true, numListstatusThreads=0, maxMaps=20, mapBandwidth=0.0, copyStrategy='uniformsize', preserveStatus=[REPLICATION, BLOCKSIZE, USER, GROUP, PERMISSION, CHECKSUMTYPE, TIMES], atomicWorkPath=null, logPath=null, sourceFileListing=null, sourcePaths=[webhdfs://192.168.88.22:50070/mysql_datas], targetPath=webhdfs://Sslave0:50070/mysql_datas, filtersFile='null', blocksPerChunk=0, copyBufferSize=8192, verboseLog=false}, sourcePaths=[webhdfs://192.168.88.22:50070/mysql_datas], targetPathExists=true, preserveRawXattrsfalse
2018-08-03 16:27:25,326 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
2018-08-03 16:27:27,555 INFO tools.SimpleCopyListing: Paths (files+dirs) cnt = 25; dirCnt = 0
2018-08-03 16:27:27,555 INFO tools.SimpleCopyListing: Build file listing completed.
2018-08-03 16:27:27,557 INFO Configuration.deprecation: io.sort.mb is deprecated. Instead, use mapreduce.task.io.sort.mb
2018-08-03 16:27:27,558 INFO Configuration.deprecation: io.sort.factor is deprecated. Instead, use mapreduce.task.io.sort.factor
2018-08-03 16:27:27,739 INFO tools.DistCp: Number of paths in the copy list: 25
2018-08-03 16:27:27,833 INFO tools.DistCp: Number of paths in the copy list: 25
2018-08-03 16:27:27,858 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
2018-08-03 16:27:27,991 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/hadoop/.staging/job_1533283659094_0008
2018-08-03 16:27:28,238 INFO mapreduce.JobSubmitter: number of splits:3
2018-08-03 16:27:28,298 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
2018-08-03 16:27:28,471 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1533283659094_0008
2018-08-03 16:27:28,473 INFO mapreduce.JobSubmitter: Executing with tokens: []
2018-08-03 16:27:28,822 INFO conf.Configuration: resource-types.xml not found
2018-08-03 16:27:28,823 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2018-08-03 16:27:28,964 INFO impl.YarnClientImpl: Submitted application application_1533283659094_0008
2018-08-03 16:27:29,031 INFO mapreduce.Job: The url to track the job: http://Sslave0:8088/proxy/application_1533283659094_0008/
2018-08-03 16:27:29,032 INFO tools.DistCp: DistCp job-id: job_1533283659094_0008
2018-08-03 16:27:29,033 INFO mapreduce.Job: Running job: job_1533283659094_0008
2018-08-03 16:27:44,220 INFO mapreduce.Job: Job job_1533283659094_0008 running in uber mode : false
2018-08-03 16:27:44,221 INFO mapreduce.Job: map 0% reduce 0%
2018-08-03 16:27:59,632 INFO mapreduce.Job: map 33% reduce 0%
2018-08-03 16:28:00,703 INFO mapreduce.Job: map 67% reduce 0%
2018-08-03 16:28:04,737 INFO mapreduce.Job: map 100% reduce 0%
2018-08-03 16:28:06,772 INFO mapreduce.Job: Job job_1533283659094_0008 completed successfully
2018-08-03 16:28:07,072 INFO mapreduce.Job: Counters: 40
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=622029
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=5309
HDFS: Number of bytes written=1488
HDFS: Number of read operations=24
HDFS: Number of large read operations=0
HDFS: Number of write operations=6
WEBHDFS: Number of bytes read=0
WEBHDFS: Number of bytes written=0
WEBHDFS: Number of read operations=128
WEBHDFS: Number of large read operations=0
WEBHDFS: Number of write operations=25
Job Counters
Launched map tasks=3
Other local map tasks=3
Total time spent by all maps in occupied slots (ms)=62674
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=31337
Total vcore-milliseconds taken by all map tasks=31337
Total megabyte-milliseconds taken by all map tasks=64178176
Map-Reduce Framework
Map input records=25
Map output records=25
Input split bytes=408
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=455
CPU time spent (ms)=6740
Physical memory (bytes) snapshot=701685760
Virtual memory (bytes) snapshot=10003836928
Total committed heap usage (bytes)=465567744
Peak Map Physical memory (bytes)=238911488
Peak Map Virtual memory (bytes)=3337875456
File Input Format Counters
Bytes Read=4901
File Output Format Counters
Bytes Written=1488
DistCp Counters
Bandwidth in Btyes=0
Bytes Skipped=9365494
Files Skipped=25
yarn application -list 可以看到任務進度
資料同步結果,
失誤 有同步一遍,開發在跑資料
相關推薦
hadoop distcp 同步叢集資料
命令1:hadoop distcp -update -delete -p hdfs://Smaster:9000//newexchange hdfs://Sslave0:9000/newexchange 命令2:hadoop distcp -update -delete
Hadoop distcp 跨叢集遷移資料
hadoop中有一個叫做distcp(分散式複製)的有用程式,能從hadoop的檔案系統並行複製大量資料。 distcp一般用於在兩個HDFS叢集中傳輸資料。如果叢集在hadoop的同一版本上執行,就適合使用hdfs方案: % hadoop distcp hdfs://namenode1/foo hd
大資料入門(14)hadoop+yarn+zookeeper叢集搭建
1、右鍵clone虛擬機器,進入圖形介面,修改虛擬機器ip即可,相關環境變數配置都存在 2、叢集規劃:(必須設定主機名,配置主機名和ip的對映關係,每個檔案都需要配置對映關係) 主機名 &
把kafka資料從hbase遷移到hdfs,並按天載入到hive表(hbase與hadoop為不同叢集)
需求:由於我們用的阿里雲Hbase,按儲存收費,現在需要把kafka的資料直接同步到自己搭建的hadoop叢集上,(kafka和hadoop叢集在同一個區域網),然後對接到hive表中去,表按每天做分割槽 一、首先檢視kafka最小偏移量(offset) /usr/local/kafka/bin/k
大資料之Hadoop學習(環境配置)——Hadoop偽分散式叢集搭建
title: Hadoop偽分散式叢集搭建 date: 2018-11-14 15:17:20 tags: Hadoop categories: 大資料 點選檢視我的部落格: Josonlee’s Blog 文章目錄 前言準備 偽分
本地搭建hadoop叢集--ntp同步叢集時間
使用ntp對外提供伺服器叢集時間同步 一般選擇masterzu作為ntp伺服器 首先檢查機器是否 安裝ntp rpm -qa|grep ntp 如果安裝則編輯文件如下操作 儲存退出 vim /etc/ntp.conf 第一個圈 取消註釋 第二
大資料之五 hadoop HDFS HA叢集客戶端+eclipse配置
首先我們選擇一臺客戶機,任意選擇,只要能與叢集通訊即可,這裡就使用真機 將叢集中配置好的 hadoop 安裝包拷貝到真機上 配置 hadoop 的環境變數 HADOOP_HOME:hadoop安裝包的位置 HADOOP_USER_NAME:登入叢集的使用者名稱稱,只要是可以登入叢集的使用者名稱就可以,這裡配
大資料平臺Hadoop的分散式叢集環境搭建,官網推薦
1 概述 本文章介紹大資料平臺Hadoop的分散式環境搭建、以下為Hadoop節點的部署圖,將NameNode部署在master1,SecondaryNameNode部署在master2,slave1、slave2、slave3中分別部署一個DataNode節點 NN
大資料學習之旅2——從零開始搭hadoop完全分散式叢集
前言 本文從零開始搭hadoop完全分散式叢集,大概花費了一天的時間邊搭邊寫部落格,一步一步完成完成叢集配置,相信大家按照本文一步一步來完全可以搭建成功。需要注意的是本文限於篇幅和時間的限制,也是為了突出重點,一些很基礎的操作就不再詳細
linux下安裝hadoop偽分散式叢集
1. 在虛擬機器上安裝了centos7, 下載hadoop。http://hadoop.apache.org/releases.html 這裡選擇2.7版本而不是3.0版本 不選3.0版本的理由,檢視版本更新日誌
《Hadoop 權威指南 - 大資料的儲存與分析》學習筆記
第一章 初識Hadoop 1.2 資料的儲存與分析 對多個硬碟中的資料並行進行讀/寫資料,有以下兩個重要問題: 硬體故障問題。解決方案:複製(replication),系統儲存資料的副本(replica)。 以某種方式結合大部分資料來共同完成分析。MapReduce
解決hadoop在配置叢集中出現的問題
1、解壓檔案命令 tar -zxvf 檔名.tar.gz 2、ssh ssh應該算是一個最讓人頭疼的問題,它預先安裝的ssh服務,是用不了的,因此需要用到解除安裝命令將其解除安裝。 sudo apt-get remove ssh 重新安裝ssh服務
hadoop Cloudera-Manager叢集搭建總結
1、網上搭建步驟很多,我參考的這個連結的:https://blog.csdn.net/suifeng3051/article/details/45477773 2、安裝上述步驟基本沒問題,但是有幾點注意如下: (1)服務端安裝完畢後客戶端最好手工的方式在每個客戶端上
資料庫同步表資料利器,oracle+mybatis 一個sql控制所有表增刪改 ${xxx} 和 #{xxx}的區別
資料庫同步表資料利器,mybatis 一個sql控制所有表增刪改 在專案開發過程中,尤其是多系統專案叢集中,經常會遇到需要從一個數據庫同步n張表到另一個數據庫中的需求,還需要對這些表做監聽,在發現有修改的時候進行增量資料同步。 通常的方法是在接受資料庫對應的專案中寫介面供資料來源專案
一個用Go寫的叢集資料分發工具
背景 在工作中遇到大資料(20G左右)批量部署的問題,剛開始通過scp或者rsync限速80M序列分發,大約每臺機器的耗時為5分鐘,極大的增加了大批量部署的時間和難度,各種難產。於是作者就花了一天多的時間做了個簡易的傳輸工具,現在還不太成熟,希望對大家有啟發,更希望大家多多提建議。 思路
PHP使用SWOOLE擴充套件實現定時同步 MySQL 資料
南寧公司和幾個分公司之間都使用了呼叫系統,然後現在需要做一個呼叫通話資料分析,由於分公司的呼叫伺服器是在內網,通過技術手段映射出來,分公司到南寧之間的網路不穩定,所以需要把分公司的通話資料同步到南寧。 本身最簡單的方法就是直接配置MySQL的主從同步就可以同步資料到南寧來了。但是銷售呼叫系統那邊
搭建hadoop偽分散式叢集環境過程中遇見的問題總結
1、網路配置問題: 在centos7中配置網路環境後,本機(win10系統)與虛擬機器centos7網路不通(本機可以ping通虛擬機器,但是虛擬機器ping不通本機); 解決方式: 方式1:檢視本機win10系統的防火牆是否關閉,若沒有,直接關閉win10系統的防火牆即可; 方式2
hadoop\zookeeper\hbase叢集重啟後出現的相關問題
1、在叢集上的主節點/usr/local/hadoop/bin目錄下,執行./start-dfs.sh命令後,只有主節點下的namenode進行啟動,分支節點中的datanode沒有正常啟動 解決方式: 1)刪除hadoop目錄中的tmp目錄檔案及log目錄檔案(叢集中所有節點) 2)在
Spark學習筆記(二) 安裝Hadoop單節點叢集
安裝Hadoop單節點叢集 1. 下載並解壓Hadoop 1.1 下載Hadoop 1.2 解壓Hadoop包 1.3 將解壓的資料夾重新命名為Hadoop,然後拷貝到/usr/local下 2. 設定Hadoop環境變數
windows10下 eclipse連線虛擬機器中的Hadoop偽分散式叢集
在windows用eclipse連線hadoop之後,可以便於進行mapreduce開發,非常方便,如果在虛擬機器裡面用eclipse的話 ,emmmmmm,你會卡到懷疑人生。 首先需要去下載eclipse,這個直接官網就ok link