使用ES-Hadoop外掛通過Hive查詢ES中的資料
阿新 • • 發佈:2019-01-05
本來是想既然可以通過es-hadoop外掛用hive查詢es的資料,為啥不能用impala來做分析呢;
結果是hive查es成功了,impala查詢不了,悲劇,但還是記錄一下過程中遇到的問題,特別是日期格式轉換那裡搞了好久。
找了個簡單的es索引,就兩列,detectTime,flow
"mappings": {
"flow_message_": {
"properties": {
"detectTime": {
"format": "YYYY-MM-dd HH:mm:ss",
"type" : "date"
},
"flow": {
"type": "integer"
}
}
}
},
進入hive命令列:
1.新增jar
hive> add jar file:///home/elasticsearch/es-hadoop/elasticsearch-hadoop-5.5.0.jar;
2.建立表
CREATE EXTERNAL TABLE flow_message_xxx (
id string,
detect_time timestamp,
flow int
)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES(
'es.nodes' = '192.168.x.xx:9500,192.168.x.xx:9500',
'es.index.auto.create' = 'false',
'es.resource' = 'flow_message_201708/flow_message_',
'es.read.metadata' = 'true',
'es.mapping.names' = 'id:_metadata._id, detect_time:detectTime, flow:flow' );
//es.nodes節點
//es.resource:es index/type
//es.mapping.names:hive欄位和es欄位對應
表建立成功
然而悲劇的是查詢失敗:
Failed with exception java.io.IOException:
org.elasticsearch.hadoop.rest.EsHadoopParsingException:
Cannot parse value [2017-08-24 14:15:52] for field [detectTime]
應該是es的這種時間格式無法正確解析
匯入es-hadoop原始碼看了下
發現HiveValueReader中使用的是下面這個方法解析的時間字串
@Override
protected Object parseDate(String value, boolean richDate) {
return (richDate ? new TimestampWritable(new Timestamp(DatatypeConverter.parseDateTime(value).getTimeInMillis())) : parseString(value));
}
DatatypeConverter.parseDateTime(value)
並不是我們自定的格式YYYY-MM-dd HH:mm:ss
這樣我們可以在Hive中指定時間格式,並自定義自己的Reader
CREATE EXTERNAL TABLE flow_message_xxx (
id string,
detect_time timestamp,
flow int
)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES(
'es.nodes' = 'es.nodes' = '192.168.x.xx:9500,192.168.x.xx:9500',
'es.index.auto.create' = 'false',
'es.resource' = 'flow_message_201708/flow_message_',
'es.read.metadata' = 'true',
'es.date.format' = 'yyyy-MM-dd HH:mm:ss',
'es.ser.reader.value.class' = 'com.eshadoop.EsValueReader',
'es.mapping.names' = 'id:_metadata._id,detect_time:detectTime');
//es.date.format:格式
//es.ser.reader.value.class:類
自定義的EsValueReader:
package com.eshadoop;
/**
* @function
* @author meyao
* @create 2018-04-18 16:21
* @version v1.0
**/
import org.apache.hadoop.hive.serde2.io.TimestampWritable;
import org.elasticsearch.hadoop.cfg.Settings;
import org.elasticsearch.hadoop.hive.HiveValueReader;
import javax.xml.bind.DatatypeConverter;
import java.sql.Timestamp;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Date;
public class EsValueReader extends HiveValueReader {
private String dateFormat;
private static SimpleDateFormat format_ymd_hms;
@Override
public void setSettings(Settings settings) {
super.setSettings(settings);
//獲取我們自定的時間格式
dateFormat = settings.getProperty("es.date.format");
format_ymd_hms = new SimpleDateFormat(dateFormat);
}
@Override
protected Object parseDate(String value, boolean richDate) {
Date d = null;
if (!"".equals(dateFormat) && dateFormat != null) {
try {
d = format_ymd_hms.parse(value);
} catch (ParseException e) {
e.printStackTrace();
}
} else {
d = DatatypeConverter.parseDateTime(value).getTime();
}
return (richDate ? new TimestampWritable(new Timestamp(d.getTime())) : parseString(value));
}
}
打成Jar放入hive中lib目錄裡
建立成功
結果如下: