1. 程式人生 > >Datax3.0的安裝和基本使用

Datax3.0的安裝和基本使用

安裝過程:
1、先解壓datax的安裝包

[root@slave1 datax]# tar -xvf datax.tar.gz 

2、個datax的安裝路徑授權

[root@slave1 datax]# chmod -R 775 ./datax

3、測試樣例

[[email protected] bin]# python datax.py ../job/job.json

DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2016, Alibaba Group. All Rights Reserved.


2016
-12-10 02:55:17.640 [main] INFO VMInfo - VMInfo# operatingSystem class => com.sun.management.UnixOperatingSystem 2016-12-10 02:55:17.654 [main] INFO Engine - the machine info => osInfo: Oracle Corporation 1.7 24.79-b02 jvmInfo: Linux amd64 2.6.32-431.el6.x86_64 cpu num: 1
totalPhysicalMemory: 1.83G freePhysicalMemory: 0.09G maxFileDescriptorCount: 4096 currentOpenFileDescriptorCount: 67 GC Names [MarkSweepCompact, Copy] MEMORY_NAME | allocation_size | init_size Tenured Gen | 682.69
MB | 682.69MB Eden Space | 273.06MB | 273.06MB Code Cache | 48.00MB | 2.44MB Perm Gen | 82.00MB | 20.75MB Survivor Space | 34.13MB | 34.13MB 2016-12-10 02:55:17.693 [main] INFO Engine - { "content":[ { "reader":{ "name":"streamreader", "parameter":{ "column":[ { "type":"string", "value":"DataX" }, { "type":"long", "value":19890604 }, { "type":"date", "value":"1989-06-04 00:00:00" }, { "type":"bool", "value":true }, { "type":"bytes", "value":"test" } ], "sliceRecordCount":100000 } }, "writer":{ "name":"streamwriter", "parameter":{ "encoding":"UTF-8", "print":false } } } ], "setting":{ "errorLimit":{ "percentage":0.02, "record":0 }, "speed":{ "byte":10485760 } } } 2016-12-10 02:55:17.757 [main] WARN Engine - prioriy set to 0, because NumberFormatException, the value is: null 2016-12-10 02:55:17.764 [main] INFO PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0 2016-12-10 02:55:17.765 [main] INFO JobContainer - DataX jobContainer starts job. 2016-12-10 02:55:17.775 [main] INFO JobContainer - Set jobId = 0 2016-12-10 02:55:17.854 [job-0] INFO JobContainer - jobContainer starts to do prepare ... 2016-12-10 02:55:17.859 [job-0] INFO JobContainer - DataX Reader.Job [streamreader] do prepare work . 2016-12-10 02:55:17.859 [job-0] INFO JobContainer - DataX Writer.Job [streamwriter] do prepare work . 2016-12-10 02:55:17.863 [job-0] INFO JobContainer - jobContainer starts to do split ... 2016-12-10 02:55:17.864 [job-0] INFO JobContainer - Job set Max-Byte-Speed to 10485760 bytes. 2016-12-10 02:55:17.875 [job-0] INFO JobContainer - DataX Reader.Job [streamreader] splits to [1] tasks. 2016-12-10 02:55:17.875 [job-0] INFO JobContainer - DataX Writer.Job [streamwriter] splits to [1] tasks. 2016-12-10 02:55:17.931 [job-0] INFO JobContainer - jobContainer starts to do schedule ... 2016-12-10 02:55:17.960 [job-0] INFO JobContainer - Scheduler starts [1] taskGroups. 2016-12-10 02:55:17.979 [job-0] INFO JobContainer - Running by standalone Mode. 2016-12-10 02:55:18.017 [taskGroup-0] INFO TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks. 2016-12-10 02:55:18.040 [taskGroup-0] INFO Channel - Channel set byte_speed_limit to -1, No bps activated. 2016-12-10 02:55:18.049 [taskGroup-0] INFO Channel - Channel set record_speed_limit to -1, No tps activated. 2016-12-10 02:55:18.117 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started 2016-12-10 02:55:18.628 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[516]ms 2016-12-10 02:55:18.628 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] completed it's tasks. 2016-12-10 02:55:28.074 [job-0] INFO StandAloneJobContainerCommunicator - Total 100000 records, 2600000 bytes | Speed 253.91KB/s, 10000 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.055s | All Task WaitReaderTime 0.360s | Percentage 100.00% 2016-12-10 02:55:28.074 [job-0] INFO AbstractScheduler - Scheduler accomplished all tasks. 2016-12-10 02:55:28.076 [job-0] INFO JobContainer - DataX Writer.Job [streamwriter] do post work. 2016-12-10 02:55:28.077 [job-0] INFO JobContainer - DataX Reader.Job [streamreader] do post work. 2016-12-10 02:55:28.079 [job-0] INFO JobContainer - DataX jobId [0] completed successfully. 2016-12-10 02:55:28.082 [job-0] INFO HookInvoker - No hook invoked, because base dir not exists or is a file: /root/datax/datax/hook 2016-12-10 02:55:28.087 [job-0] INFO JobContainer - [total cpu info] => averageCpu | maxDeltaCpu | minDeltaCpu 14.42% | 14.42% | 14.42% [total gc info] => NAME | totalGCCount | maxDeltaGCCount | minDeltaGCCount | totalGCTime | maxDeltaGCTime | minDeltaGCTime MarkSweepCompact | 0 | 0 | 0 | 0.000s | 0.000s | 0.000s Copy | 0 | 0 | 0 | 0.000s | 0.000s | 0.000s 2016-12-10 02:55:28.089 [job-0] INFO JobContainer - PerfTrace not enable! 2016-12-10 02:55:28.091 [job-0] INFO StandAloneJobContainerCommunicator - Total 100000 records, 2600000 bytes | Speed 253.91KB/s, 10000 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.055s | All Task WaitReaderTime 0.360s | Percentage 100.00% 2016-12-10 02:55:28.093 [job-0] INFO JobContainer - 任務啟動時刻 : 2016-12-10 02:55:17 任務結束時刻 : 2016-12-10 02:55:28 任務總計耗時 : 10s 任務平均流量 : 253.91KB/s 記錄寫入速度 : 10000rec/s 讀出記錄總數 : 100000 讀寫失敗總數 : 0

以上表示能夠正常使用

配置測試樣例:
下面我們配置一組 從mysql資料庫到另一個mysql資料庫

第一步、建立作業的配置檔案(json格式)
可以通過命令檢視配置模板: python datax.py -r {YOUR_READER} -w {YOUR_WRITER}

[root@slave1 bin]# python datax.py -r mysqlreader -w mysqlwriter

DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2016, Alibaba Group. All Rights Reserved.


Please refer to the mysqlreader document:
     https://github.com/alibaba/DataX/blob/master/mysqlreader/doc/mysqlreader.md 

Please refer to the mysqlwriter document:
     https://github.com/alibaba/DataX/blob/master/mysqlwriter/doc/mysqlwriter.md 

Please save the following configuration as a json file and  use
     python {DATAX_HOME}/bin/datax.py {JSON_FILE_NAME}.json 
to run the job.

{
    "job": {
        "content": [
            {
                "reader": {
                    "name": "mysqlreader", 
                    "parameter": {
                        "column": [], 
                        "connection": [
                            {
                                "jdbcUrl": [], 
                                "table": []
                            }
                        ], 
                        "password": "", 
                        "username": "", 
                        "where": ""
                    }
                }, 
                "writer": {
                    "name": "mysqlwriter", 
                    "parameter": {
                        "column": [], 
                        "connection": [
                            {
                                "jdbcUrl": "", 
                                "table": []
                            }
                        ], 
                        "password": "", 
                        "preSql": [], 
                        "session": [], 
                        "username": "", 
                        "writeMode": ""
                    }
                }
            }
        ], 
        "setting": {
            "speed": {
                "channel": ""
            }
        }
    }
}

第二步、根據配置檔案模板填寫相關選項
命令列印裡面包含對應reader、writer的文件地址,以及配置json樣例,根據json樣例填空完成配置即可。根據模板配置json檔案(mysql2mysql.json)如下

{
    "job": {
        "content": [
            {
                "reader": {
                    "name": "mysqlreader", 
                    "parameter": {
                        "column": ["store_id","store_type","region_id","store_name","store_number","store_street_address","store_city","store_state","store_postal_code","store_country","store_manager","store_phone","store_fax","first_opened_date","last_remodel_date","store_sqft","grocery_sqft","frozen_sqft","meat_sqft","coffee_bar","video_store","salad_bar","prepared_food","florist"], 
                        "connection": [
                            {
                                "jdbcUrl": ["jdbc:mysql://192.168.18.149:3306/saiku"], 
                                "table": ["store"]
                            }
                        ], 
                        "password": "root", 
                        "username": "root"
                    }
                }, 
                "writer": {
                    "name": "mysqlwriter", 
                    "parameter": {
                        "column": ["store_id","store_type","region_id","store_name","store_number","store_street_address","store_city","store_state","store_postal_code","store_country","store_manager","store_phone","store_fax","first_opened_date","last_remodel_date","store_sqft","grocery_sqft","frozen_sqft","meat_sqft","coffee_bar","video_store","salad_bar","prepared_food","florist"], 
                        "connection": [
                            {
                                "jdbcUrl": "jdbc:mysql://192.168.18.147:3306/saiku", 
                                "table": ["store"]
                            }
                        ], 
                        "password": "root", 
                        "username": "root"
                    }
                }
            }
        ], 
        "setting": {
            "speed": {
                "channel": "1"
            }
        }
    }
}

第三步:啟動DataX

[root@slave1 bin]# python datax.py ./mysql2mysql.json 
同步結束,顯示日誌如下:
任務啟動時刻                    : 2016-12-10 07:47:16
任務結束時刻                    : 2016-12-10 07:47:27
任務總計耗時                    :                 11s
任務平均流量                    :              320B/s
記錄寫入速度                    :              2rec/s
讀出記錄總數                    :                  25
讀寫失敗總數                    :                   0

job的配置
(1)、Job基本配置
Job基本配置定義了一個Job基礎的、框架級別的配置資訊,包括:

{
  "job": {
    "content": [
      {
        "reader": {
          "name": "",
          "parameter": {}
        },
        "writer": {
          "name": "",
          "parameter": {}
        }
      }
    ],
    "setting": {
      "speed": {},
      "errorLimit": {}
    }
  }
}

(2) Job Setting配置

{
  "job": {
    "content": [
      {
        "reader": {
          "name": "",
          "parameter": {}
        },
        "writer": {
          "name": "",
          "parameter": {}
        }
      }
    ],
    "setting": {
      "speed": {
        "channel": 1,
        "byte": 104857600
      },
      "errorLimit": {
        "record": 10,
        "percentage": 0.05
      }
    }
  }
}

● job.setting.speed(流量控制)
Job支援使用者對速度的自定義控制,channel的值可以控制同步時的併發數,byte的值可以控制同步時的速度
● job.setting.errorLimit(髒資料控制)
Job支援使用者對於髒資料的自定義監控和告警,包括對髒資料最大記錄數閾值(record值)或者髒資料佔比閾值(percentage值),當Job傳輸過程出現的髒資料大於使用者指定的數量/百分比,DataX Job報錯退出。