1. 程式人生 > >文字資料匯入HBASE

文字資料匯入HBASE

在將有定界符文字檔案匯入HBASE庫中,需要將後面的定界符去掉,否則將匯入失敗。

如下所示:


[[email protected] bin]$ cat /tmp/emp.txt
1,A,201304,
2,B,201305,
3,C,201306,
4,D,201307,

這個檔案後面多了一個逗號。

[[email protected] bin]$ hadoop fs -put /tmp/emp.txt /emp.txt



hbase(main):017:0> describe 't'
DESCRIPTION                                                                                      ENABLED                                             
 {NAME => 't', FAMILIES => [{NAME => 'cf', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE', true                                                
  REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL =>                                                      
 '2147483647', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY => 'false', ENCODE                                                     
 _ON_DISK => 'true', BLOCKCACHE => 'true'}]}                                                                                                         
1 row(s) in 0.1410 seconds



表T只有一個COLUMN FAMILAY CF.



[[email protected] bin]$ hadoop jar /home/hadoop/hbase-0.94.6/hbase-0.94.6.jar importtsv -Dimporttsv.columns=HBASE_ROW_KEY,cf:c1,cf:c2 -Dimporttsv.separator=, t /emp.txt

............

13/04/10 08:06:24 INFO mapred.JobClient: Running job: job_201304100706_0008
13/04/10 08:06:25 INFO mapred.JobClient:  map 0% reduce 0%
13/04/10 08:07:24 INFO mapred.JobClient:  map 100% reduce 0%
13/04/10 08:07:29 INFO mapred.JobClient: Job complete: job_201304100706_0008
13/04/10 08:07:29 INFO mapred.JobClient: Counters: 19
13/04/10 08:07:29 INFO mapred.JobClient:   Job Counters 
13/04/10 08:07:29 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=37179
13/04/10 08:07:29 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/04/10 08:07:29 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/04/10 08:07:29 INFO mapred.JobClient:     Rack-local map tasks=1
13/04/10 08:07:29 INFO mapred.JobClient:     Launched map tasks=1
13/04/10 08:07:29 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
13/04/10 08:07:29 INFO mapred.JobClient:   ImportTsv
13/04/10 08:07:29 INFO mapred.JobClient:     Bad Lines=4
13/04/10 08:07:29 INFO mapred.JobClient:   File Output Format Counters 
13/04/10 08:07:29 INFO mapred.JobClient:     Bytes Written=0
13/04/10 08:07:29 INFO mapred.JobClient:   FileSystemCounters
13/04/10 08:07:29 INFO mapred.JobClient:     HDFS_BYTES_READ=145
13/04/10 08:07:29 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=33535
13/04/10 08:07:29 INFO mapred.JobClient:   File Input Format Counters 
13/04/10 08:07:29 INFO mapred.JobClient:     Bytes Read=48
13/04/10 08:07:29 INFO mapred.JobClient:   Map-Reduce Framework
13/04/10 08:07:29 INFO mapred.JobClient:     Map input records=4
13/04/10 08:07:29 INFO mapred.JobClient:     Physical memory (bytes) snapshot=37830656
13/04/10 08:07:29 INFO mapred.JobClient:     Spilled Records=0
13/04/10 08:07:29 INFO mapred.JobClient:     CPU time spent (ms)=200
13/04/10 08:07:29 INFO mapred.JobClient:     Total committed heap usage (bytes)=8155136
13/04/10 08:07:29 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=345518080
13/04/10 08:07:29 INFO mapred.JobClient:     Map output records=0
13/04/10 08:07:29 INFO mapred.JobClient:     SPLIT_RAW_BYTES=97


可以看到這4行都被標記為Bad Lines而拋棄了。


[[email protected] bin]$ cat /tmp/emp.txt
1,A,201304
2,B,201305
3,C,201306
4,D,201307

[[email protected] bin]$ hadoop fs -rmr /emp.txt

Deleted hdfs://192.168.0.88:9000/emp.txt



[[email protected] bin]$ hadoop fs -put /tmp/emp.txt /emp.txt





[[email protected] bin]$ hadoop jar /home/hadoop/hbase-0.94.6/hbase-0.94.6.jar importtsv -Dimporttsv.columns=HBASE_ROW_KEY,cf:c1,cf:c2 -Dimporttsv.separator=, t /emp.txt                             

13/04/10 07:54:40 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.3-1240972, built on 02/06/2012 10:48 GMT
13/04/10 07:54:40 INFO zookeeper.ZooKeeper: Client environment:host.name=hadoop1
13/04/10 07:54:40 INFO zookeeper.ZooKeeper: Client environment:java.version=1.7.0
13/04/10 07:54:40 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
13/04/10 07:54:40 INFO zookeeper.ZooKeeper: Client environment:java.home=/java/jdk1.7.0/jre
13/04/10 07:54:40 INFO zookeeper.ZooKeeper: Client environment:java.class.path=/home/hadoop/hadoop-1.0.4/conf:/java/jdk1.7.0/lib/tools.jar:/home/hadoop/hadoop-1.0.4/libexec/..:/home/hadoop/hadoop-1.0.4/libexec/../hadoop-core-1.0.4.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/asm-3.2.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/aspectjrt-1.6.5.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/aspectjtools-1.6.5.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/commons-beanutils-1.7.0.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/commons-beanutils-core-1.8.0.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/commons-cli-1.2.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/commons-codec-1.4.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/commons-collections-3.2.1.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/commons-configuration-1.6.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/commons-daemon-1.0.1.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/commons-digester-1.8.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/commons-el-1.0.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/commons-httpclient-3.0.1.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/commons-io-2.1.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/commons-lang-2.4.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/commons-logging-1.1.1.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/commons-logging-api-1.0.4.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/commons-math-2.1.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/commons-net-1.4.1.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/core-3.1.1.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/guava-11.0.2.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/hadoop-capacity-scheduler-1.0.4.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/hadoop-fairscheduler-1.0.4.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/hadoop-thriftfs-1.0.4.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/hbase-0.94.6-tests.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/hbase-0.94.6.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/hsqldb-1.8.0.10.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/jackson-core-asl-1.8.8.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/jackson-mapper-asl-1.8.8.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/jasper-compiler-5.5.12.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/jasper-runtime-5.5.12.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/jdeb-0.8.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/jersey-core-1.8.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/jersey-json-1.8.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/jersey-server-1.8.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/jets3t-0.6.1.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/jetty-6.1.26.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/jetty-util-6.1.26.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/jsch-0.1.42.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/junit-4.5.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/kfs-0.2.2.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/log4j-1.2.15.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/mockito-all-1.8.5.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/oro-2.0.8.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/protobuf-java-2.4.0a.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/servlet-api-2.5-20081211.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/slf4j-api-1.4.3.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/slf4j-log4j12-1.4.3.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/xmlenc-0.52.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/zookeeper-3.4.3.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/jsp-2.1/jsp-2.1.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/jsp-2.1/jsp-api-2.1.jar
13/04/10 07:54:40 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/home/hadoop/hadoop-1.0.4/libexec/../lib/native/Linux-i386-32
13/04/10 07:54:40 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
13/04/10 07:54:40 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
13/04/10 07:54:40 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
13/04/10 07:54:40 INFO zookeeper.ZooKeeper: Client environment:os.arch=i386
13/04/10 07:54:40 INFO zookeeper.ZooKeeper: Client environment:os.version=2.6.18-92.el5xen
13/04/10 07:54:40 INFO zookeeper.ZooKeeper: Client environment:user.name=hadoop
13/04/10 07:54:40 INFO zookeeper.ZooKeeper: Client environment:user.home=/home/hadoop
13/04/10 07:54:40 INFO zookeeper.ZooKeeper: Client environment:user.dir=/home/hadoop/sqoop-1.4.3/bin
13/04/10 07:54:40 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=192.168.0.90:2181 sessionTimeout=180000 watcher=hconnection
13/04/10 07:54:40 INFO zookeeper.ClientCnxn: Opening socket connection to server /192.168.0.90:2181
13/04/10 07:54:40 INFO client.ZooKeeperSaslClient: Client will not SASL-authenticate because the default JAAS configuration section 'Client' could not be found. If you are not using SASL, you may ignore this. On the other hand, if you expected SASL to work, please fix your JAAS configuration.
13/04/10 07:54:40 INFO zookeeper.RecoverableZooKeeper: The identifier of this process is [email protected]
13/04/10 07:54:40 INFO zookeeper.ClientCnxn: Socket connection established to hadoop3/192.168.0.90:2181, initiating session
13/04/10 07:54:46 INFO zookeeper.ClientCnxn: Session establishment complete on server hadoop3/192.168.0.90:2181, sessionid = 0x13df12619940011, negotiated timeout = 180000
13/04/10 07:54:56 INFO mapreduce.TableOutputFormat: Created table instance for t
13/04/10 07:54:56 INFO input.FileInputFormat: Total input paths to process : 1
13/04/10 07:54:56 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/04/10 07:54:56 WARN snappy.LoadSnappy: Snappy native library not loaded
13/04/10 07:54:57 INFO mapred.JobClient: Running job: job_201304100706_0007
13/04/10 07:54:59 INFO mapred.JobClient:  map 0% reduce 0%
13/04/10 07:57:29 INFO mapred.JobClient:  map 100% reduce 0%
13/04/10 07:57:37 INFO mapred.JobClient: Job complete: job_201304100706_0007
13/04/10 07:57:37 INFO mapred.JobClient: Counters: 19
13/04/10 07:57:37 INFO mapred.JobClient:   Job Counters 
13/04/10 07:57:37 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=125785
13/04/10 07:57:37 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/04/10 07:57:37 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/04/10 07:57:37 INFO mapred.JobClient:     Rack-local map tasks=1
13/04/10 07:57:37 INFO mapred.JobClient:     Launched map tasks=1
13/04/10 07:57:37 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
13/04/10 07:57:37 INFO mapred.JobClient:   ImportTsv
13/04/10 07:57:37 INFO mapred.JobClient:     Bad Lines=0
13/04/10 07:57:37 INFO mapred.JobClient:   File Output Format Counters 
13/04/10 07:57:37 INFO mapred.JobClient:     Bytes Written=0
13/04/10 07:57:37 INFO mapred.JobClient:   FileSystemCounters
13/04/10 07:57:37 INFO mapred.JobClient:     HDFS_BYTES_READ=141
13/04/10 07:57:37 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=33537
13/04/10 07:57:37 INFO mapred.JobClient:   File Input Format Counters 
13/04/10 07:57:37 INFO mapred.JobClient:     Bytes Read=44
13/04/10 07:57:37 INFO mapred.JobClient:   Map-Reduce Framework
13/04/10 07:57:37 INFO mapred.JobClient:     Map input records=4
13/04/10 07:57:37 INFO mapred.JobClient:     Physical memory (bytes) snapshot=37867520
13/04/10 07:57:37 INFO mapred.JobClient:     Spilled Records=0
13/04/10 07:57:37 INFO mapred.JobClient:     CPU time spent (ms)=170
13/04/10 07:57:37 INFO mapred.JobClient:     Total committed heap usage (bytes)=7950336
13/04/10 07:57:37 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=345387008
13/04/10 07:57:37 INFO mapred.JobClient:     Map output records=4
13/04/10 07:57:37 INFO mapred.JobClient:     SPLIT_RAW_BYTES=97



hbase(main):016:0> scan 't'
ROW                                    COLUMN+CELL                                                                                                   
 1                                     column=cf:c1, timestamp=1365551680259, value=A                                                                
 1                                     column=cf:c2, timestamp=1365551680259, value=201304                                                           
 2                                     column=cf:c1, timestamp=1365551680259, value=B                                                                
 2                                     column=cf:c2, timestamp=1365551680259, value=201305                                                           
 3                                     column=cf:c1, timestamp=1365551680259, value=C                                                                
 3                                     column=cf:c2, timestamp=1365551680259, value=201306                                                           
 4                                     column=cf:c1, timestamp=1365551680259, value=D                                                                

 4                                     column=cf:c2, timestamp=1365551680259, value=201307               

4 row(s) in 0.5480 seconds

bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=HBASE_ROW_KEY,info:ticket_office_no,info:ticket_window_no,info:ticket_sale_date,info:inner_code,info:office_no,info:window_no,info:operater_no,info:train_no,,info:train_date,info:coach_no,info:seat_no,info:ticket_no return_record /user/hadoop/InputData/return_record0725.bcp

相關推薦

MapReduce將HDFS文字資料匯入HBase

HBase本身提供了很多種資料匯入的方式,通常有兩種常用方式: 使用HBase提供的TableOutputFormat,原理是通過一個Mapreduce作業將資料匯入HBase 另一種方式就是使用HBase原生Client API 本文就是示範如何通過M

文字資料匯入HBASE

在將有定界符文字檔案匯入HBASE庫中,需要將後面的定界符去掉,否則將匯入失敗。如下所示:[[email protected] bin]$ cat /tmp/emp.txt1,A,201304,2,B,201305,3,C,201306,4,D,201307,這個

hive over hbase方式將文字資料匯入hbase

1,建立hbase表Corpus >> create 'Corpus','CF' 2,建立hive->hbase外表logic_Corpus,並對應hbase中的Corpus表 >> CREATE EXTERNAL TABLE logic_Co

Mysql 資料匯入 Hbase

目錄 一、前言 一、前言 在大資料專案中需要做資料遷移時,我們第一時間總會想到sqoop。sqoop是apache 旗下一款“Hadoop 和關係資料庫伺服器之間傳送資料”的工具,

使用命令將文字資料匯入到資料庫中

1.下載 oracle 客戶端 和 plsql Oracle 的下載地址: 2. 建立 load.ctl 檔案 在任意資料夾下建立  load.ctl 檔案,用編輯器開啟 load.ctl 檔

使用PLSQL將文字資料匯入

開啟PLSQL,點選“Tools”-->“Text Importer...”,點選下圖的按鈕開啟所要匯入的檔案 匯入檔案之後,在下方進行一些列的設定之後,點選上圖中的“Data to Oracle” 點選“Data to Oracle”之後的操作如下圖所示 &

將sqlserver的資料匯入hbase

將sqlserver的資料匯入hbase中 1.解壓sqoop-sqlserver-1.0.tar.gz,並改名(可以不改)          tar  -zxvf  sqoop- sql

flume將資料匯入hbase

1 將hbase的lib目錄下jar拷貝到flume的lib目錄下;2 在hbase中建立儲存資料的表hbase(main):002:0> create 'test_idoall_org','uid','name'3 建立flume配置檔案 vi.confa1.sour

通過sqoop將MySQL資料庫中的資料匯入Hbase

從接觸到大資料到成功的實現一個功能期間走了不少彎路也踩了不少坑,這裡作為我的學習筆記也可以作為小白們的前車之鑑,少走彎路,有不正確之處,望指出 環境準備: hadoop、hbase、sqoop、mys

用sqoop將oracle資料匯入Hbase 使用筆記

網上已經有很多關於這方面的資料,但是我在使用過程中也遇見了不少問題 1. sqoop 的環境我沒有自己搭建  直接用的公司的 2. oracle 小白怕把公司環境弄壞了,自己用容器搭建了一個 docker pull docker.io/wnameless/oracle-xe

kafka資料匯入hbase

我們在使用kafka處理資料的過程中會使用kafka跟一下資料庫進行互動,Hbase就是其中的一種。下面給大家介紹一下kafka中的資料是如何匯入Hbase的。 本文的思路是通過consumers把資料消費到Hbase中。 首先在Hbase中建立表,建立表可以在H

Kettle 將Oracle資料匯入HBase的注意事項

      使用Kettle採集Oracle資料,匯入到HBase。 Kettle是一個比較好用的ETL工具,個人感覺Kettle比Sqoop還要好用,主要是因為Kettle通過視覺化,元件式拖拉配置

python 指令碼實列(文字資料匯入資料庫)

指令碼要求:1.根據文件資料建立表結構;2.將文件上的行號與每行的資料錄入資料庫;匯入模組連結資料庫設定表結構建立表格使用seesion管理資料庫獲取檔案資料,匯入資料到資料庫提交資料,關閉連結資料庫資料:

資料匯入HBase常用方法

【編者按】要使用Hadoop,資料合併至關重要,HBase應用甚廣。一般而言,需要 針對不同情景模式將現有的各種型別的資料庫或資料檔案中的資料轉入至HBase 中。常見方式為:使用HBase的API中的Put方法; 使用HBase 的bulk load 工具;使用定製的MapReduce Job方式。《H

.txt形式的文字資料匯入oracle資料庫

客戶端連線資料庫匯入 1. 安裝有oracle客戶端,配好監聽。 2. 以oracle資料庫app使用者的表user_svc_info為例 <span style="color:#3333ff;">CREATE TABLE USER_SVC_INFO( PH

HBase Shell 操作命令&&使用Sqoop將資料匯入HBase

一、HBase Shell 操作命令實驗 要求: HBase叢集正常啟動,且可以執行正常 進入客戶端 [[email protected] ~]$ cd /home/zkpk/hbase-0

文字檔案匯入HBase

文字檔案匯入到Hbase中  建立表sudo su - su - hadoop ./hbase shellcreate 'table1',{NAME => 'DF', VERSIONS => 5}  www.2cto.com  配置環境 1.修改hadoop環

HBase資料匯入(一)importtsv工具匯入文字檔案到Hbase

剛安裝好Hbase,如果不知道怎麼安裝,請參見我這篇博文,首先想到的就是能夠匯入大量資料,然後查詢玩玩。 怎麼匯入呢,瞭解到可以從文字檔案匯入,那就先測試一下吧,在這之前先要配置一下Hadoop。 配置步驟: 1 首先要修改Hadoop的配置檔案hadoop-env.s

資料匯入終章:如何將HBase資料匯入HDFS?

我們的最終目標是將資料匯入Hadoop,在之前的章節中,我們介紹瞭如何將傳統關係資料庫的資料匯入Hadoop,本節涉及到了HBase。HBase是一種實時分散式資料儲存系統,通常位於與Hadoop叢集相同的硬體上,或者與Hadoop叢集緊密相連,能夠直接在MapReduce中使用HBase資料,或將

hbase資料匯入匯出

hbase資料匯入 將本地檔案(test.csv)上傳到hdfs的根目錄下,然後匯入資料到hbase 1.本地寫一個檔案進行測試,檔名為test.csv,內容如下: 2.將檔案上傳到Hadoop 3.檢視是否上傳成功(檔案存在,表示成功) 4.進入hbase s