1. 程式人生 > >hadoop distcp 同步叢集資料

hadoop distcp 同步叢集資料

命令1:hadoop distcp -update -delete -p hdfs://Smaster:9000//newexchange  hdfs://Sslave0:9000/newexchange

命令2:hadoop distcp -update -delete -p  webhdfs://192.168.88.22:50070/mysql_datas      webhdfs://Sslave0:50070/mysql_datas

hadoop distcp  源資料    複製到備份伺服器

-update -delete -p  刪除本地的可以不加

命令1 別介紹是同版本複製同步,命令2 不同hadoop版本複製同步 但命令1 我在hadoop3.0.3 獲取 hadoop2.6.0資料可以用

[email protected] mapreduce]$ hdfs dfs -ls  /    #hadoop-3.0.3
Found 3 items
drwxr-xr-x   - hadoop supergroup          0 2018-08-03 15:42 /hive2.3
drwxrwxrwx   - hadoop supergroup          0 2018-08-03 16:09 /tmp
drwxr-xr-x   - hadoop supergroup          0 2018-08-03 13:54 /user

[[email protected] mapreduce]$ hdfs dfs -ls hdfs://Smaster:9000/     #hadoop-2.6.0             
Found 10 items
drwxr-xr-x   - hadoop supergroup          0 2018-08-02 16:22 hdfs://Smaster:9000/check
drwxr-xr-x   - hadoop supergroup          0 2018-08-02 17:40 hdfs://Smaster:9000/dsqdata
drwxrwxrwx   - hadoop supergroup          0 2018-08-02 16:05 hdfs://Smaster:9000/hive
drwxr-xr-x   - hadoop supergroup          0 2018-08-03 15:55 hdfs://Smaster:9000/mysql_datas
drwxr-xr-x   - hadoop supergroup          0 2018-08-03 11:08 hdfs://Smaster:9000/newexchange
drwxr-xr-x   - hadoop supergroup          0 2018-08-02 16:25 hdfs://Smaster:9000/spark_jars
drwxr-xr-x   - hadoop supergroup          0 2018-08-02 16:25 hdfs://Smaster:9000/test
drwxrwxrwx   - hadoop supergroup          0 2018-08-03 10:43 hdfs://Smaster:9000/tmp
drwxr-xr-x   - hadoop supergroup          0 2018-08-02 16:11 hdfs://Smaster:9000/user
drwxr-xr-x   - hadoop supergroup          0 2018-08-02 16:25 hdfs://Smaster:9000/ziang_test

[[email protected] ~]$ hadoop distcp webhdfs://Smaster:50070/mysql_datas      webhdfs://Sslave0:50070/mysql_datas             
2018-08-03 16:23:36,464 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, overwrite=false, append=false, useDiff=false, useRdiff=false, fromSnapshot=null, toSnapshot=null, skipCRC=false, blocking=true, numListstatusThreads=0, maxMaps=20, mapBandwidth=0.0, copyStrategy='uniformsize', preserveStatus=[BLOCKSIZE], atomicWorkPath=null, logPath=null, sourceFileListing=null, sourcePaths=[webhdfs://192.168.88.22:50070/mysql_datas], targetPath=webhdfs://Sslave0:50070/mysql_datas, filtersFile='null', blocksPerChunk=0, copyBufferSize=8192, verboseLog=false}, sourcePaths=[webhdfs://192.168.88.22:50070/mysql_datas], targetPathExists=false, preserveRawXattrsfalse
2018-08-03 16:23:36,627 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
2018-08-03 16:23:38,310 INFO tools.SimpleCopyListing: Paths (files+dirs) cnt = 26; dirCnt = 1
2018-08-03 16:23:38,310 INFO tools.SimpleCopyListing: Build file listing completed.
2018-08-03 16:23:38,312 INFO Configuration.deprecation: io.sort.mb is deprecated. Instead, use mapreduce.task.io.sort.mb
2018-08-03 16:23:38,312 INFO Configuration.deprecation: io.sort.factor is deprecated. Instead, use mapreduce.task.io.sort.factor
2018-08-03 16:23:38,468 INFO tools.DistCp: Number of paths in the copy list: 26
2018-08-03 16:23:38,546 INFO tools.DistCp: Number of paths in the copy list: 26
2018-08-03 16:23:38,567 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
2018-08-03 16:23:38,693 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/hadoop/.staging/job_1533283659094_0006
2018-08-03 16:23:38,999 INFO mapreduce.JobSubmitter: number of splits:5
2018-08-03 16:23:39,064 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
2018-08-03 16:23:39,265 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1533283659094_0006
2018-08-03 16:23:39,267 INFO mapreduce.JobSubmitter: Executing with tokens: []
2018-08-03 16:23:39,534 INFO conf.Configuration: resource-types.xml not found
2018-08-03 16:23:39,534 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2018-08-03 16:23:39,642 INFO impl.YarnClientImpl: Submitted application application_1533283659094_0006
2018-08-03 16:23:39,721 INFO mapreduce.Job: The url to track the job: http://Sslave0:8088/proxy/application_1533283659094_0006/
2018-08-03 16:23:39,721 INFO tools.DistCp: DistCp job-id: job_1533283659094_0006
2018-08-03 16:23:39,723 INFO mapreduce.Job: Running job: job_1533283659094_0006
2018-08-03 16:23:48,934 INFO mapreduce.Job: Job job_1533283659094_0006 running in uber mode : false
2018-08-03 16:23:48,936 INFO mapreduce.Job:  map 0% reduce 0%
2018-08-03 16:24:07,254 INFO mapreduce.Job:  map 20% reduce 0%
2018-08-03 16:24:10,281 INFO mapreduce.Job:  map 40% reduce 0%
2018-08-03 16:24:21,376 INFO mapreduce.Job:  map 60% reduce 0%
2018-08-03 16:24:22,383 INFO mapreduce.Job:  map 80% reduce 0%
2018-08-03 16:24:29,452 INFO mapreduce.Job:  map 100% reduce 0%
2018-08-03 16:24:30,476 INFO mapreduce.Job: Job job_1533283659094_0006 completed successfully
2018-08-03 16:24:30,994 INFO mapreduce.Job: Counters: 42
        File System Counters
                FILE: Number of bytes read=0
                FILE: Number of bytes written=1036715
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=6315
                HDFS: Number of bytes written=0
                HDFS: Number of read operations=40
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=10
                WEBHDFS: Number of bytes read=9365494
                WEBHDFS: Number of bytes written=9365494
                WEBHDFS: Number of read operations=232
                WEBHDFS: Number of large read operations=0
                WEBHDFS: Number of write operations=76
        Job Counters
                Launched map tasks=5
                Other local map tasks=5
                Total time spent by all maps in occupied slots (ms)=127246
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=63623
                Total vcore-milliseconds taken by all map tasks=63623
                Total megabyte-milliseconds taken by all map tasks=130299904
        Map-Reduce Framework
                Map input records=26
                Map output records=0
                Input split bytes=685
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=907
                CPU time spent (ms)=12660
                Physical memory (bytes) snapshot=1198813184
                Virtual memory (bytes) snapshot=16667459584
                Total committed heap usage (bytes)=822607872
                Peak Map Physical memory (bytes)=252973056
                Peak Map Virtual memory (bytes)=3337940992
        File Input Format Counters
                Bytes Read=5630
        File Output Format Counters
                Bytes Written=0
        DistCp Counters
                Bandwidth in Btyes=5871336
                Bytes Copied=9365494
                Bytes Expected=9365494
                Files Copied=25
                DIR_COPY=1

[[email protected] ~]$ hadoop distcp -update -delete -p  webhdfs://Smaster:50070/mysql_datas      webhdfs://Sslave0:50070/mysql_datas
2018-08-03 16:27:25,141 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=true, deleteMissing=true, ignoreFailures=false, overwrite=false, append=false, useDiff=false, useRdiff=false, fromSnapshot=null, toSnapshot=null, skipCRC=false, blocking=true, numListstatusThreads=0, maxMaps=20, mapBandwidth=0.0, copyStrategy='uniformsize', preserveStatus=[REPLICATION, BLOCKSIZE, USER, GROUP, PERMISSION, CHECKSUMTYPE, TIMES], atomicWorkPath=null, logPath=null, sourceFileListing=null, sourcePaths=[webhdfs://192.168.88.22:50070/mysql_datas], targetPath=webhdfs://Sslave0:50070/mysql_datas, filtersFile='null', blocksPerChunk=0, copyBufferSize=8192, verboseLog=false}, sourcePaths=[webhdfs://192.168.88.22:50070/mysql_datas], targetPathExists=true, preserveRawXattrsfalse
2018-08-03 16:27:25,326 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
2018-08-03 16:27:27,555 INFO tools.SimpleCopyListing: Paths (files+dirs) cnt = 25; dirCnt = 0
2018-08-03 16:27:27,555 INFO tools.SimpleCopyListing: Build file listing completed.
2018-08-03 16:27:27,557 INFO Configuration.deprecation: io.sort.mb is deprecated. Instead, use mapreduce.task.io.sort.mb
2018-08-03 16:27:27,558 INFO Configuration.deprecation: io.sort.factor is deprecated. Instead, use mapreduce.task.io.sort.factor
2018-08-03 16:27:27,739 INFO tools.DistCp: Number of paths in the copy list: 25
2018-08-03 16:27:27,833 INFO tools.DistCp: Number of paths in the copy list: 25
2018-08-03 16:27:27,858 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
2018-08-03 16:27:27,991 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/hadoop/.staging/job_1533283659094_0008
2018-08-03 16:27:28,238 INFO mapreduce.JobSubmitter: number of splits:3
2018-08-03 16:27:28,298 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
2018-08-03 16:27:28,471 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1533283659094_0008
2018-08-03 16:27:28,473 INFO mapreduce.JobSubmitter: Executing with tokens: []
2018-08-03 16:27:28,822 INFO conf.Configuration: resource-types.xml not found
2018-08-03 16:27:28,823 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2018-08-03 16:27:28,964 INFO impl.YarnClientImpl: Submitted application application_1533283659094_0008
2018-08-03 16:27:29,031 INFO mapreduce.Job: The url to track the job: http://Sslave0:8088/proxy/application_1533283659094_0008/
2018-08-03 16:27:29,032 INFO tools.DistCp: DistCp job-id: job_1533283659094_0008
2018-08-03 16:27:29,033 INFO mapreduce.Job: Running job: job_1533283659094_0008
2018-08-03 16:27:44,220 INFO mapreduce.Job: Job job_1533283659094_0008 running in uber mode : false
2018-08-03 16:27:44,221 INFO mapreduce.Job:  map 0% reduce 0%
2018-08-03 16:27:59,632 INFO mapreduce.Job:  map 33% reduce 0%
2018-08-03 16:28:00,703 INFO mapreduce.Job:  map 67% reduce 0%
2018-08-03 16:28:04,737 INFO mapreduce.Job:  map 100% reduce 0%
2018-08-03 16:28:06,772 INFO mapreduce.Job: Job job_1533283659094_0008 completed successfully
2018-08-03 16:28:07,072 INFO mapreduce.Job: Counters: 40
        File System Counters
                FILE: Number of bytes read=0
                FILE: Number of bytes written=622029
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=5309
                HDFS: Number of bytes written=1488
                HDFS: Number of read operations=24
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=6
                WEBHDFS: Number of bytes read=0
                WEBHDFS: Number of bytes written=0
                WEBHDFS: Number of read operations=128
                WEBHDFS: Number of large read operations=0
                WEBHDFS: Number of write operations=25
        Job Counters
                Launched map tasks=3
                Other local map tasks=3
                Total time spent by all maps in occupied slots (ms)=62674
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=31337
                Total vcore-milliseconds taken by all map tasks=31337
                Total megabyte-milliseconds taken by all map tasks=64178176
        Map-Reduce Framework
                Map input records=25
                Map output records=25
                Input split bytes=408
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=455
                CPU time spent (ms)=6740
                Physical memory (bytes) snapshot=701685760
                Virtual memory (bytes) snapshot=10003836928
                Total committed heap usage (bytes)=465567744
                Peak Map Physical memory (bytes)=238911488
                Peak Map Virtual memory (bytes)=3337875456
        File Input Format Counters
                Bytes Read=4901
        File Output Format Counters
                Bytes Written=1488
        DistCp Counters
                Bandwidth in Btyes=0
                Bytes Skipped=9365494
                Files Skipped=25

yarn application -list   可以看到任務進度

資料同步結果,

失誤 有同步一遍,開發在跑資料

相關推薦

hadoop distcp 同步叢集資料

命令1:hadoop distcp -update -delete -p hdfs://Smaster:9000//newexchange  hdfs://Sslave0:9000/newexchange 命令2:hadoop distcp -update -delete

Hadoop distcp叢集遷移資料

hadoop中有一個叫做distcp(分散式複製)的有用程式,能從hadoop的檔案系統並行複製大量資料。 distcp一般用於在兩個HDFS叢集中傳輸資料。如果叢集在hadoop的同一版本上執行,就適合使用hdfs方案:    % hadoop distcp hdfs://namenode1/foo hd

資料入門(14)hadoop+yarn+zookeeper叢集搭建

1、右鍵clone虛擬機器,進入圖形介面,修改虛擬機器ip即可,相關環境變數配置都存在 2、叢集規劃:(必須設定主機名,配置主機名和ip的對映關係,每個檔案都需要配置對映關係)     主機名       &

把kafka資料從hbase遷移到hdfs,並按天載入到hive表(hbase與hadoop為不同叢集)

需求:由於我們用的阿里雲Hbase,按儲存收費,現在需要把kafka的資料直接同步到自己搭建的hadoop叢集上,(kafka和hadoop叢集在同一個區域網),然後對接到hive表中去,表按每天做分割槽 一、首先檢視kafka最小偏移量(offset) /usr/local/kafka/bin/k

資料Hadoop學習(環境配置)——Hadoop偽分散式叢集搭建

title: Hadoop偽分散式叢集搭建 date: 2018-11-14 15:17:20 tags: Hadoop categories: 大資料 點選檢視我的部落格: Josonlee’s Blog 文章目錄 前言準備 偽分

本地搭建hadoop叢集--ntp同步叢集時間

使用ntp對外提供伺服器叢集時間同步 一般選擇masterzu作為ntp伺服器 首先檢查機器是否 安裝ntp rpm -qa|grep ntp 如果安裝則編輯文件如下操作 儲存退出 vim /etc/ntp.conf 第一個圈 取消註釋 第二

資料之五 hadoop HDFS HA叢集客戶端+eclipse配置

首先我們選擇一臺客戶機,任意選擇,只要能與叢集通訊即可,這裡就使用真機 將叢集中配置好的 hadoop 安裝包拷貝到真機上 配置 hadoop 的環境變數 HADOOP_HOME:hadoop安裝包的位置 HADOOP_USER_NAME:登入叢集的使用者名稱稱,只要是可以登入叢集的使用者名稱就可以,這裡配

資料平臺Hadoop的分散式叢集環境搭建,官網推薦

1 概述 本文章介紹大資料平臺Hadoop的分散式環境搭建、以下為Hadoop節點的部署圖,將NameNode部署在master1,SecondaryNameNode部署在master2,slave1、slave2、slave3中分別部署一個DataNode節點 NN

資料學習之旅2——從零開始搭hadoop完全分散式叢集

前言        本文從零開始搭hadoop完全分散式叢集,大概花費了一天的時間邊搭邊寫部落格,一步一步完成完成叢集配置,相信大家按照本文一步一步來完全可以搭建成功。需要注意的是本文限於篇幅和時間的限制,也是為了突出重點,一些很基礎的操作就不再詳細

linux下安裝hadoop偽分散式叢集

1.    在虛擬機器上安裝了centos7, 下載hadoop。http://hadoop.apache.org/releases.html  這裡選擇2.7版本而不是3.0版本 不選3.0版本的理由,檢視版本更新日誌

Hadoop 權威指南 - 大資料的儲存與分析》學習筆記

第一章 初識Hadoop 1.2 資料的儲存與分析 對多個硬碟中的資料並行進行讀/寫資料,有以下兩個重要問題: 硬體故障問題。解決方案:複製(replication),系統儲存資料的副本(replica)。 以某種方式結合大部分資料來共同完成分析。MapReduce

解決hadoop在配置叢集中出現的問題

1、解壓檔案命令 tar -zxvf 檔名.tar.gz 2、ssh ssh應該算是一個最讓人頭疼的問題,它預先安裝的ssh服務,是用不了的,因此需要用到解除安裝命令將其解除安裝。 sudo apt-get remove ssh 重新安裝ssh服務

hadoop Cloudera-Manager叢集搭建總結

1、網上搭建步驟很多,我參考的這個連結的:https://blog.csdn.net/suifeng3051/article/details/45477773 2、安裝上述步驟基本沒問題,但是有幾點注意如下:    (1)服務端安裝完畢後客戶端最好手工的方式在每個客戶端上

資料庫同步資料利器,oracle+mybatis 一個sql控制所有表增刪改 ${xxx} 和 #{xxx}的區別

資料庫同步表資料利器,mybatis 一個sql控制所有表增刪改 在專案開發過程中,尤其是多系統專案叢集中,經常會遇到需要從一個數據庫同步n張表到另一個數據庫中的需求,還需要對這些表做監聽,在發現有修改的時候進行增量資料同步。 通常的方法是在接受資料庫對應的專案中寫介面供資料來源專案

一個用Go寫的叢集資料分發工具

背景 在工作中遇到大資料(20G左右)批量部署的問題,剛開始通過scp或者rsync限速80M序列分發,大約每臺機器的耗時為5分鐘,極大的增加了大批量部署的時間和難度,各種難產。於是作者就花了一天多的時間做了個簡易的傳輸工具,現在還不太成熟,希望對大家有啟發,更希望大家多多提建議。 思路

PHP使用SWOOLE擴充套件實現定時同步 MySQL 資料

南寧公司和幾個分公司之間都使用了呼叫系統,然後現在需要做一個呼叫通話資料分析,由於分公司的呼叫伺服器是在內網,通過技術手段映射出來,分公司到南寧之間的網路不穩定,所以需要把分公司的通話資料同步到南寧。 本身最簡單的方法就是直接配置MySQL的主從同步就可以同步資料到南寧來了。但是銷售呼叫系統那邊

搭建hadoop偽分散式叢集環境過程中遇見的問題總結

1、網路配置問題: 在centos7中配置網路環境後,本機(win10系統)與虛擬機器centos7網路不通(本機可以ping通虛擬機器,但是虛擬機器ping不通本機); 解決方式: 方式1:檢視本機win10系統的防火牆是否關閉,若沒有,直接關閉win10系統的防火牆即可; 方式2

hadoop\zookeeper\hbase叢集重啟後出現的相關問題

1、在叢集上的主節點/usr/local/hadoop/bin目錄下,執行./start-dfs.sh命令後,只有主節點下的namenode進行啟動,分支節點中的datanode沒有正常啟動 解決方式: 1)刪除hadoop目錄中的tmp目錄檔案及log目錄檔案(叢集中所有節點) 2)在

Spark學習筆記(二) 安裝Hadoop單節點叢集

安裝Hadoop單節點叢集 1. 下載並解壓Hadoop 1.1 下載Hadoop 1.2 解壓Hadoop包 1.3 將解壓的資料夾重新命名為Hadoop,然後拷貝到/usr/local下 2. 設定Hadoop環境變數

windows10下 eclipse連線虛擬機器中的Hadoop偽分散式叢集

在windows用eclipse連線hadoop之後,可以便於進行mapreduce開發,非常方便,如果在虛擬機器裡面用eclipse的話  ,emmmmmm,你會卡到懷疑人生。     首先需要去下載eclipse,這個直接官網就ok  link