手寫一個簡單的ElasticSearch SQL轉換器(一)
一.前言
之前有個需求,是使ElasticSearch支援使用SQL進行簡單查詢,較新版本的ES已經支援該特性(不過貌似還是實驗性質的?) ,而且git上也有elasticsearch-sql
外掛,之所以決定手寫一個,主要有兩點原因:
1. 目前用的ES版本較老
2. elasticsearch-sql雖好,但比較複雜,程式碼也不易維護
3. 練練手
二.技術選型
目前主流軟體中通常使用ANTLR做詞法語法分析,諸如著名的Hibernate,Spark,Hive等專案,之前因為工作原因也有所接觸,不過如果只是解析標準SQL的話,
其實還有更好的選擇,如使用Hibernate或阿里巴巴的資料庫Druid(Druid採用了手寫詞法語法分析器的方案,這種方式當然比自動ANTLR生成的解析器效能高得多), 這裡
我選擇了第二種方案。
開始之前先看下我們可以通過Druid拿到的SQL語言的抽象語法樹:
圖片:https://www.jianshu.com/p/437aa22ea3ca
三.技術實現
首先我們建立一個SqlParser類,主流程都在parse方法中,該方法負責將一個SQL字串解析(順便說一句,Druid支援多種SQL方言,這裡我選擇了MySQL),
並返回SearchSourceBuilder物件,這是一個ElasticSearch提供的DSL構建器,以該物件作為引數,ES client端即可發起對ES 服務端搜尋請求。
1 /** 2 * 3 * @author fred 4 * 5 */ 6 public class SqlParser { 7 private final static String dbType = JdbcConstants.MYSQL; 8 private final static Logger logger = LoggerFactory.getLogger(SqlParser.class); 9 private SearchSourceBuilder builder; 10 11 public SqlParser(SearchSourceBuilder builder) { 12 this.builder = builder; 13 } 14 /** 15 * 將SQL解析為ES查詢 16 */ 17 public SearchSourceBuilder parse(String sql) throws Exception { 18 if (Objects.isNull(sql)) { 19 throw new IllegalArgumentException("輸入語句不得為空"); 20 } 21 sql = sql.trim().toLowerCase(); 22 List<SQLStatement> stmtList = SQLUtils.parseStatements(sql, dbType); 23 if (Objects.isNull(stmtList) || stmtList.size() != 1) { 24 throw new IllegalArgumentException("必須輸入一句查詢語句"); 25 } 26 // 使用Parser解析生成AST 27 SQLStatement stmt = stmtList.get(0); 28 if (!(stmt instanceof SQLSelectStatement)) { 29 throw new IllegalArgumentException("輸入語句須為Select語句"); 30 } 31 SQLSelectStatement sqlSelectStatement = (SQLSelectStatement) stmt; 32 SQLSelectQuery sqlSelectQuery = sqlSelectStatement.getSelect().getQuery(); 33 SQLSelectQueryBlock sqlSelectQueryBlock = (SQLSelectQueryBlock) sqlSelectQuery; 34 35 SQLExpr whereExpr = sqlSelectQueryBlock.getWhere(); 36 37 // 生成ES查詢條件 38 BoolQueryBuilder bridge = QueryBuilders.boolQuery(); 39 bridge.must(); 40 41 QueryBuilder whereBuilder = whereHelper(whereExpr); // 處理where 42 bridge.must(whereBuilder); 43 SQLOrderBy orderByExpr = sqlSelectQueryBlock.getOrderBy(); // 處理order by 44 if (Objects.nonNull(orderByExpr)) { 45 orderByHelper(orderByExpr, bridge); 46 } 47 builder.query(bridge); 48 return builder; 49 }
主流程很簡單,拿到SQL字串後,直接通過Druid API將其轉換為抽象語法樹,我們要求輸入語句必須為Select語句。接下來是對where語句和order by語句的處理,
目前的難點其實主要在於如何將where語句對映到ES查詢中。
先從簡單的看起,如何處理order by呢?SQL語句中 order by顯然可以允許使用者根據多欄位排序,所以排序欄位肯定是一個List<排序欄位>,我們要做的就是將這個List對映到
SearchSourceBuilder物件中。見下面程式碼:
1 /** 2 * 處理所有order by欄位 3 * 4 * @param orderByExpr 5 */ 6 private void orderByHelper(SQLOrderBy orderByExpr, BoolQueryBuilder bridge) { 7 List<SQLSelectOrderByItem> orderByList = orderByExpr.getItems(); // 待排序的列 8 for (SQLSelectOrderByItem sqlSelectOrderByItem : orderByList) { 9 if (sqlSelectOrderByItem.getType() == null) { 10 sqlSelectOrderByItem.setType(SQLOrderingSpecification.ASC); // 預設升序 11 } 12 String orderByColumn = sqlSelectOrderByItem.getExpr().toString(); 13 builder.sort(orderByColumn, 14 sqlSelectOrderByItem.getType().equals(SQLOrderingSpecification.ASC) ? SortOrder.ASC 15 : SortOrder.DESC); 16 } 17 }
通過Druid的API,我們很容易拿到了SQL語句中所有的排序欄位,我們逐個遍歷這些欄位,拿到排序的列名字面量和順序,傳遞給SearchSourceBuilder的sort方法,需注意的
是, 如果原始SQL中沒有指定欄位是順序,我們預設升序。
接下來我們處理稍微有點麻煩的where語句,因為SQL語句被解析成了語法樹,很自然的我們想到使用遞迴方式進行處理。 而通常在處理遞迴問題的時候,
我習慣於從遞迴的base case開始考慮,where語句中的運算子根據Druid API中的定義主要分為以下三種:
1. 簡單二元運算子:包括邏輯處理,如and, or 和大部分關係運算(後續會詳細講)
2. between或not between運算子:我們可以簡單的將其對映成ES中的Range Query
3. in, not in 運算子: 可以簡單的對映成ES中的Term Query
通過Druid,我們可以很方便的獲取每種運算中的運算子與運算元
1 /** 2 * 遞迴遍歷“where”子樹 3 * 4 * @return 5 */ 6 private QueryBuilder whereHelper(SQLExpr expr) throws Exception { 7 if (Objects.isNull(expr)) { 8 throw new NullPointerException("節點不能為空!"); 9 } 10 BoolQueryBuilder bridge = QueryBuilders.boolQuery(); 11 if (expr instanceof SQLBinaryOpExpr) { // 二元運算 12 SQLBinaryOperator operator = ((SQLBinaryOpExpr) expr).getOperator(); // 獲取運算子 13 if (operator.isLogical()) { // and,or,xor 14 return handleLogicalExpr(expr); 15 } else if (operator.isRelational()) { // 具體的運算,位於葉子節點 16 return handleRelationalExpr(expr); 17 } 18 } else if (expr instanceof SQLBetweenExpr) { // between運算 19 SQLBetweenExpr between = ((SQLBetweenExpr) expr); 20 boolean isNotBetween = between.isNot(); // between or not between ? 21 String testExpr = between.testExpr.toString(); 22 String fromStr = formatSQLValue(between.beginExpr.toString()); 23 String toStr = formatSQLValue(between.endExpr.toString()); 24 if (isNotBetween) { 25 bridge.must(QueryBuilders.rangeQuery(testExpr).lt(fromStr).gt(toStr)); 26 } else { 27 bridge.must(QueryBuilders.rangeQuery(testExpr).gte(fromStr).lte(toStr)); 28 } 29 return bridge; 30 } else if (expr instanceof SQLInListExpr) { // SQL的 in語句,ES中對應的是terms 31 SQLInListExpr siExpr = (SQLInListExpr) expr; 32 boolean isNotIn = siExpr.isNot(); // in or not in? 33 String leftSide = siExpr.getExpr().toString(); 34 List<SQLExpr> inSQLList = siExpr.getTargetList(); 35 List<String> inList = new ArrayList<>(); 36 for (SQLExpr in : inSQLList) { 37 String str = formatSQLValue(in.toString()); 38 inList.add(str); 39 } 40 if (isNotIn) { 41 bridge.mustNot(QueryBuilders.termsQuery(leftSide, inList)); 42 } else { 43 bridge.must(QueryBuilders.termsQuery(leftSide, inList)); 44 } 45 return bridge; 46 } 47 return bridge; 48 }
上述第一種情況比較複雜,首先我們先看看運算子是邏輯運算的情況:
如下面的程式碼所示,如果運算子是邏輯運算子,我們需要對左右運算元分別遞迴,然後根據運算子型別歸併結果:or可以對映成ES 中的Should,而and則對映成Must.
/** * 邏輯運算子,目前支援and,or * * @return * @throws Exception */ private QueryBuilder handleLogicalExpr(SQLExpr expr) throws Exception { BoolQueryBuilder bridge = QueryBuilders.boolQuery(); SQLBinaryOperator operator = ((SQLBinaryOpExpr) expr).getOperator(); // 獲取運算子 SQLExpr leftExpr = ((SQLBinaryOpExpr) expr).getLeft(); SQLExpr rightExpr = ((SQLBinaryOpExpr) expr).getRight(); // 分別遞迴左右子樹,再根據邏輯運算子將結果歸併 QueryBuilder leftBridge = whereHelper(leftExpr); QueryBuilder rightBridge = whereHelper(rightExpr); if (operator.equals(SQLBinaryOperator.BooleanAnd)) { bridge.must(leftBridge).must(rightBridge); } else if (operator.equals(SQLBinaryOperator.BooleanOr)) { bridge.should(leftBridge).should(rightBridge); } return bridge; }
下面來討論下第一種情況中,如果運算子是關係運算符的情況,我們知道,SQL中的關係運算主要就是一些比較運算子,諸如大於,小於,等於,Like等,這裡我還加上了
正則搜尋(不過貌似效能比較差,ES對正則搜尋的限制頗多,不太建議使用)。
/** * 大於小於等於正則 * * @param expr * @return */ private QueryBuilder handleRelationalExpr(SQLExpr expr) { SQLExpr leftExpr = ((SQLBinaryOpExpr) expr).getLeft(); if (Objects.isNull(leftExpr)) { throw new NullPointerException("表示式左側不得為空"); } String leftExprStr = leftExpr.toString(); String rightExprStr = formatSQLValue(((SQLBinaryOpExpr) expr).getRight().toString()); // TODO:表示式右側可以後續支援方法呼叫 SQLBinaryOperator operator = ((SQLBinaryOpExpr) expr).getOperator(); // 獲取運算子 QueryBuilder queryBuilder; switch (operator) { case GreaterThanOrEqual: queryBuilder = QueryBuilders.rangeQuery(leftExprStr).gte(rightExprStr); break; case LessThanOrEqual: queryBuilder = QueryBuilders.rangeQuery(leftExprStr).lte(rightExprStr); break; case Equality: queryBuilder = QueryBuilders.boolQuery(); TermQueryBuilder eqCond = QueryBuilders.termQuery(leftExprStr, rightExprStr); ((BoolQueryBuilder) queryBuilder).must(eqCond); break; case GreaterThan: queryBuilder = QueryBuilders.rangeQuery(leftExprStr).gt(rightExprStr); break; case LessThan: queryBuilder = QueryBuilders.rangeQuery(leftExprStr).lt(rightExprStr); break; case NotEqual: queryBuilder = QueryBuilders.boolQuery(); TermQueryBuilder notEqCond = QueryBuilders.termQuery(leftExprStr, rightExprStr); ((BoolQueryBuilder) queryBuilder).mustNot(notEqCond); break; case RegExp: // 對應到ES中的正則查詢 queryBuilder = QueryBuilders.boolQuery(); RegexpQueryBuilder regCond = QueryBuilders.regexpQuery(leftExprStr, rightExprStr); ((BoolQueryBuilder) queryBuilder).mustNot(regCond); break; case NotRegExp: queryBuilder = QueryBuilders.boolQuery(); RegexpQueryBuilder notRegCond = QueryBuilders.regexpQuery(leftExprStr, rightExprStr); ((BoolQueryBuilder) queryBuilder).mustNot(notRegCond); break; case Like: queryBuilder = QueryBuilders.boolQuery(); MatchPhraseQueryBuilder likeCond = QueryBuilders.matchPhraseQuery(leftExprStr, rightExprStr.replace("%", "")); ((BoolQueryBuilder) queryBuilder).must(likeCond); break; case NotLike: queryBuilder = QueryBuilders.boolQuery(); MatchPhraseQueryBuilder notLikeCond = QueryBuilders.matchPhraseQuery(leftExprStr, rightExprStr.replace("%", "")); ((BoolQueryBuilder) queryBuilder).mustNot(notLikeCond); break; default: throw new IllegalArgumentException("暫不支援該運算子!" + operator.toString()); } return queryBuilder; }
到這裡我們就完成了SQL轉ES DSL的功能了(其實只是簡單查詢的轉換),下面我們寫幾個Junit測試一下吧:
首先是簡單的比較運算:
public void normalSQLTest() { String sql = "select * from test where time>= 1"; SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); try { searchSourceBuilder = new SqlParser(searchSourceBuilder).parse(sql); } catch (Exception e) { e.printStackTrace(); } System.out.println(searchSourceBuilder); SearchSourceBuilder builderToCompare = new SearchSourceBuilder(); QueryBuilder whereBuilder = QueryBuilders.rangeQuery("time").gte("1"); BoolQueryBuilder briage = QueryBuilders.boolQuery(); briage.must(); briage.must(whereBuilder); builderToCompare.query(briage); assertEquals(searchSourceBuilder,builderToCompare); }
下面是輸出的ES 查詢語句:
{ "query" : { "bool" : { "must" : [ { "range" : { "time" : { "from" : "1", "to" : null, "include_lower" : true, "include_upper" : true, "boost" : 1.0 } } } ], "disable_coord" : false, "adjust_pure_negative" : true, "boost" : 1.0 } } }
再來個帶排序的:
@Test public void normalSQLWithOrderByTest() { String sql = "select * from test where time>= 1 order by time desc"; SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); try { searchSourceBuilder = new SqlParser(searchSourceBuilder).parse(sql); } catch (Exception e) { e.printStackTrace(); } System.out.println(searchSourceBuilder); SearchSourceBuilder builderToCompare = new SearchSourceBuilder(); QueryBuilder whereBuilder = QueryBuilders.rangeQuery("time").gte("1"); BoolQueryBuilder briage = QueryBuilders.boolQuery(); briage.must(); briage.must(whereBuilder); builderToCompare.sort("time",SortOrder.DESC); builderToCompare.query(briage); assertEquals(searchSourceBuilder,builderToCompare); }
between, in這些沒什麼區別,就不貼程式碼了,最後看看稍微複雜點兒,帶邏輯運算的查詢:
@Test public void sqlLogicTest() { String sql = "select * from test where raw_log not like"+"'%aaa' && b=1 or c=0"; SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); try { searchSourceBuilder = new SqlParser(searchSourceBuilder).parse(sql); } catch (Exception e) { e.printStackTrace(); } System.out.println(searchSourceBuilder); SearchSourceBuilder builderToCompare = new SearchSourceBuilder(); QueryBuilder builder =QueryBuilders.matchPhraseQuery("raw_log","aaa"); BoolQueryBuilder briage1 = QueryBuilders.boolQuery();//raw log not like briage1.mustNot(builder); BoolQueryBuilder briage2 = QueryBuilders.boolQuery(); //b=1 briage2.must(QueryBuilders.termQuery("b","1")); BoolQueryBuilder briage3 = QueryBuilders.boolQuery(); // not like and b=1 briage3.must(briage1).must(briage2); BoolQueryBuilder briage4 = QueryBuilders.boolQuery(); //c =0 briage4.must(QueryBuilders.termQuery("c","0")); BoolQueryBuilder briage5 = QueryBuilders.boolQuery(); // not like and b =1 or c =0 briage5.should(briage3).should(briage4); BoolQueryBuilder briage6 = QueryBuilders.boolQuery(); briage6.must(); briage6.must(briage5); builderToCompare.query(briage6); assertEquals(searchSourceBuilder,builderToCompare); }
下面是生成的查詢語句:
{ "query" : { "bool" : { "must" : [ { "bool" : { "should" : [ { "bool" : { "must" : [ { "bool" : { "must_not" : [ { "match_phrase" : { "raw_log" : { "query" : "aaa", "slop" : 0, "boost" : 1.0 } } } ], "disable_coord" : false, "adjust_pure_negative" : true, "boost" : 1.0 } }, { "bool" : { "must" : [ { "term" : { "b" : { "value" : "1", "boost" : 1.0 } } } ], "disable_coord" : false, "adjust_pure_negative" : true, "boost" : 1.0 } } ], "disable_coord" : false, "adjust_pure_negative" : true, "boost" : 1.0 } }, { "bool" : { "must" : [ { "term" : { "c" : { "value" : "0", "boost" : 1.0 } } } ], "disable_coord" : false, "adjust_pure_negative" : true, "boost" : 1.0 } } ], "disable_coord" : false, "adjust_pure_negative" : true, "boost" : 1.0 } } ], "disable_coord" : false, "adjust_pure_negative" : true, "boost" : 1.0 } } }
四.總結
本篇文章主要講述瞭如何使用Druid實現SQL語句轉換ES DSL進行搜尋的功能,後續文章中會陸續完善這個功能,實現諸如聚合查詢,分頁查詢等功能。
&n