1. 程式人生 > >自定義Hive檔案和記錄格式(十)

自定義Hive檔案和記錄格式(十)

create table 語句中預設的是stored as textfile

      練習了store as sequencefile,省空間,提升i/o效能

      PIG中輸入輸出分隔符預設是製表符\t,而到了hive中,預設變成了八進位制的\001, 也就是ASCII:

ctrl - A Oct Dec Hex ASCII_Char 001 1 01 SOH (start of heading) ,官方的解釋說是儘量不和文中的字元重複,因此選用了 crtrl - A

     SerDe是序列化/反序列化的簡寫形式

     

create table test(name string)n stored as sequencefile;

create table test1(name string);

stored as 影響輸入和輸出的格式

insert overwrite table test select * from prov;

insert overwrite table test1 select * from prov;

區別test與test1的儲存格式區別:

test1、

{STORED AS INPUTFORMAT

  'org.apache.hadoop.mapred.TextInputFormat'

OUTPUTFORMAT

  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'}

test、

{STORED AS INPUTFORMAT  

  'org.apache.hadoop.mapred.SequenceFileInputFormat'

OUTPUTFORMAT

'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat'}

解析XML

SELECT xpath ('<a><b id="foo" >bl</b><b id="bar" >b2</b></a>', '//@id')

FROM waibubiao1 LIMIT 1; // ["foo","bar"]

SELECT xpath('<a><b id="foo" class="bb">b123</b><b id="bar" >b2</b></a>', 'a/*[@class="bb"]/text()') 

FROM waibubiao1 LIMIT 1; //["b123"]

SELECT xpath('<a><b id="foo" class="bb">b123</b><b id="bar" >b2</b></a>', 'a/*[@class="bb"]/text()')

FROM waibubiao1 ;//waibubiao1中有六條記錄,'a/*[@class="bb"]/text()':a下面的class='bb'的值{text()}

["b123"]

["b123"]

["b123"]

["b123"]

["b123"]

["b123"]