1. 程式人生 > >手寫一個簡單的ElasticSearch SQL轉換器(一)

手寫一個簡單的ElasticSearch SQL轉換器(一)

   一.前言

   之前有個需求,是使ElasticSearch支援使用SQL進行簡單查詢,較新版本的ES已經支援該特性(不過貌似還是實驗性質的?) ,而且git上也有elasticsearch-sql

外掛,之所以決定手寫一個,主要有兩點原因:

      1. 目前用的ES版本較老

      2. elasticsearch-sql雖好,但比較複雜,程式碼也不易維護

      3. 練練手

 二.技術選型

   目前主流軟體中通常使用ANTLR做詞法語法分析,諸如著名的Hibernate,Spark,Hive等專案,之前因為工作原因也有所接觸,不過如果只是解析標準SQL的話,

 其實還有更好的選擇,如使用Hibernate或阿里巴巴的資料庫Druid(Druid採用了手寫詞法語法分析器的方案,這種方式當然比自動ANTLR生成的解析器效能高得多), 這裡

 我選擇了第二種方案。

     開始之前先看下我們可以通過Druid拿到的SQL語言的抽象語法樹:

    

 

                                                  圖片:https://www.jianshu.com/p/437aa22ea3ca

 

 三.技術實現

     首先我們建立一個SqlParser類,主流程都在parse方法中,該方法負責將一個SQL字串解析(順便說一句,Druid支援多種SQL方言,這裡我選擇了MySQL),

 並返回SearchSourceBuilder物件,這是一個ElasticSearch提供的DSL構建器,以該物件作為引數,ES client端即可發起對ES 服務端搜尋請求。

    

 1 /**
 2  * 
 3  * @author fred
 4  *
 5  */
 6 public class SqlParser {
 7     private final static String dbType = JdbcConstants.MYSQL;
 8     private final static Logger logger = LoggerFactory.getLogger(SqlParser.class);
 9     private SearchSourceBuilder builder;
10 
11     public SqlParser(SearchSourceBuilder builder) {
12         this.builder = builder;
13     }
14     /**
15      * 將SQL解析為ES查詢
16      */
17     public SearchSourceBuilder parse(String sql) throws Exception {
18         if (Objects.isNull(sql)) {
19             throw new IllegalArgumentException("輸入語句不得為空");
20         }
21         sql = sql.trim().toLowerCase();
22         List<SQLStatement> stmtList = SQLUtils.parseStatements(sql, dbType);
23         if (Objects.isNull(stmtList) || stmtList.size() != 1) {
24             throw new IllegalArgumentException("必須輸入一句查詢語句");
25         }
26         // 使用Parser解析生成AST
27         SQLStatement stmt = stmtList.get(0);
28         if (!(stmt instanceof SQLSelectStatement)) {
29             throw new IllegalArgumentException("輸入語句須為Select語句");
30         }
31         SQLSelectStatement sqlSelectStatement = (SQLSelectStatement) stmt;
32         SQLSelectQuery sqlSelectQuery = sqlSelectStatement.getSelect().getQuery();
33         SQLSelectQueryBlock sqlSelectQueryBlock = (SQLSelectQueryBlock) sqlSelectQuery;
34 
35         SQLExpr whereExpr = sqlSelectQueryBlock.getWhere();
36 
37         // 生成ES查詢條件
38         BoolQueryBuilder bridge = QueryBuilders.boolQuery();
39         bridge.must();
40 
41         QueryBuilder whereBuilder = whereHelper(whereExpr); // 處理where
42         bridge.must(whereBuilder);
43         SQLOrderBy orderByExpr = sqlSelectQueryBlock.getOrderBy(); // 處理order by
44         if (Objects.nonNull(orderByExpr)) {
45             orderByHelper(orderByExpr, bridge);
46         }
47         builder.query(bridge);
48         return builder;
49     }

     

    主流程很簡單,拿到SQL字串後,直接通過Druid API將其轉換為抽象語法樹,我們要求輸入語句必須為Select語句。接下來是對where語句和order by語句的處理,

  目前的難點其實主要在於如何將where語句對映到ES查詢中。

     先從簡單的看起,如何處理order by呢?SQL語句中 order by顯然可以允許使用者根據多欄位排序,所以排序欄位肯定是一個List<排序欄位>,我們要做的就是將這個List對映到

SearchSourceBuilder物件中。見下面程式碼:

    

 1     /**
 2      * 處理所有order by欄位
 3      * 
 4      * @param orderByExpr
 5      */
 6     private void orderByHelper(SQLOrderBy orderByExpr, BoolQueryBuilder bridge) {
 7         List<SQLSelectOrderByItem> orderByList = orderByExpr.getItems(); // 待排序的列
 8         for (SQLSelectOrderByItem sqlSelectOrderByItem : orderByList) {
 9             if (sqlSelectOrderByItem.getType() == null) {
10                 sqlSelectOrderByItem.setType(SQLOrderingSpecification.ASC); // 預設升序
11             }
12             String orderByColumn = sqlSelectOrderByItem.getExpr().toString();
13             builder.sort(orderByColumn,
14                     sqlSelectOrderByItem.getType().equals(SQLOrderingSpecification.ASC) ? SortOrder.ASC
15                             : SortOrder.DESC);
16         }
17     }

   通過Druid的API,我們很容易拿到了SQL語句中所有的排序欄位,我們逐個遍歷這些欄位,拿到排序的列名字面量和順序,傳遞給SearchSourceBuilder的sort方法,需注意的

是, 如果原始SQL中沒有指定欄位是順序,我們預設升序。

   

    接下來我們處理稍微有點麻煩的where語句,因為SQL語句被解析成了語法樹,很自然的我們想到使用遞迴方式進行處理。 而通常在處理遞迴問題的時候,

  我習慣於從遞迴的base case開始考慮,where語句中的運算子根據Druid API中的定義主要分為以下三種:

    1. 簡單二元運算子:包括邏輯處理,如and, or 和大部分關係運算(後續會詳細講)

    2. between或not between運算子:我們可以簡單的將其對映成ES中的Range Query

    3. in, not in 運算子: 可以簡單的對映成ES中的Term Query

 

   通過Druid,我們可以很方便的獲取每種運算中的運算子與運算元

 1 /**
 2      * 遞迴遍歷“where”子樹
 3      * 
 4      * @return
 5      */
 6     private QueryBuilder whereHelper(SQLExpr expr) throws Exception {
 7         if (Objects.isNull(expr)) {
 8             throw new NullPointerException("節點不能為空!");
 9         }
10         BoolQueryBuilder bridge = QueryBuilders.boolQuery();
11         if (expr instanceof SQLBinaryOpExpr) { // 二元運算
12             SQLBinaryOperator operator = ((SQLBinaryOpExpr) expr).getOperator(); // 獲取運算子
13             if (operator.isLogical()) { // and,or,xor
14                 return handleLogicalExpr(expr);
15             } else if (operator.isRelational()) { // 具體的運算,位於葉子節點
16                 return handleRelationalExpr(expr);
17             }
18         } else if (expr instanceof SQLBetweenExpr) { // between運算
19             SQLBetweenExpr between = ((SQLBetweenExpr) expr);
20             boolean isNotBetween = between.isNot(); // between or not between ?
21             String testExpr = between.testExpr.toString();
22             String fromStr = formatSQLValue(between.beginExpr.toString());
23             String toStr = formatSQLValue(between.endExpr.toString());
24             if (isNotBetween) {
25                 bridge.must(QueryBuilders.rangeQuery(testExpr).lt(fromStr).gt(toStr));
26             } else {
27                 bridge.must(QueryBuilders.rangeQuery(testExpr).gte(fromStr).lte(toStr));
28             }
29             return bridge;
30         } else if (expr instanceof SQLInListExpr) { // SQL的 in語句,ES中對應的是terms
31             SQLInListExpr siExpr = (SQLInListExpr) expr;
32             boolean isNotIn = siExpr.isNot(); // in or not in?
33             String leftSide = siExpr.getExpr().toString();
34             List<SQLExpr> inSQLList = siExpr.getTargetList();
35             List<String> inList = new ArrayList<>();
36             for (SQLExpr in : inSQLList) {
37                 String str = formatSQLValue(in.toString());
38                 inList.add(str);
39             }
40             if (isNotIn) {
41                 bridge.mustNot(QueryBuilders.termsQuery(leftSide, inList));
42             } else {
43                 bridge.must(QueryBuilders.termsQuery(leftSide, inList));
44             }
45             return bridge;
46         }
47         return bridge;
48     }

   上述第一種情況比較複雜,首先我們先看看運算子是邏輯運算的情況:

    如下面的程式碼所示,如果運算子是邏輯運算子,我們需要對左右運算元分別遞迴,然後根據運算子型別歸併結果:or可以對映成ES 中的Should,而and則對映成Must.

   

    /**
     * 邏輯運算子,目前支援and,or
     * 
     * @return
     * @throws Exception
     */
    private QueryBuilder handleLogicalExpr(SQLExpr expr) throws Exception {
        BoolQueryBuilder bridge = QueryBuilders.boolQuery();
        SQLBinaryOperator operator = ((SQLBinaryOpExpr) expr).getOperator(); // 獲取運算子
        SQLExpr leftExpr = ((SQLBinaryOpExpr) expr).getLeft();
        SQLExpr rightExpr = ((SQLBinaryOpExpr) expr).getRight();

        // 分別遞迴左右子樹,再根據邏輯運算子將結果歸併
        QueryBuilder leftBridge = whereHelper(leftExpr);
        QueryBuilder rightBridge = whereHelper(rightExpr);
        if (operator.equals(SQLBinaryOperator.BooleanAnd)) {
            bridge.must(leftBridge).must(rightBridge);
        } else if (operator.equals(SQLBinaryOperator.BooleanOr)) {
            bridge.should(leftBridge).should(rightBridge);
        }
        return bridge;
    }

   下面來討論下第一種情況中,如果運算子是關係運算符的情況,我們知道,SQL中的關係運算主要就是一些比較運算子,諸如大於,小於,等於,Like等,這裡我還加上了

正則搜尋(不過貌似效能比較差,ES對正則搜尋的限制頗多,不太建議使用)。

  

/**
     * 大於小於等於正則
     * 
     * @param expr
     * @return
     */
    private QueryBuilder handleRelationalExpr(SQLExpr expr) {
        SQLExpr leftExpr = ((SQLBinaryOpExpr) expr).getLeft();
        if (Objects.isNull(leftExpr)) {
            throw new NullPointerException("表示式左側不得為空");
        }
        String leftExprStr = leftExpr.toString();
        String rightExprStr = formatSQLValue(((SQLBinaryOpExpr) expr).getRight().toString()); // TODO:表示式右側可以後續支援方法呼叫
        SQLBinaryOperator operator = ((SQLBinaryOpExpr) expr).getOperator(); // 獲取運算子
        QueryBuilder queryBuilder;
        switch (operator) {
        case GreaterThanOrEqual:
            queryBuilder = QueryBuilders.rangeQuery(leftExprStr).gte(rightExprStr);
            break;
        case LessThanOrEqual:
            queryBuilder = QueryBuilders.rangeQuery(leftExprStr).lte(rightExprStr);
            break;
        case Equality:
            queryBuilder = QueryBuilders.boolQuery();
            TermQueryBuilder eqCond = QueryBuilders.termQuery(leftExprStr, rightExprStr);
            ((BoolQueryBuilder) queryBuilder).must(eqCond);
            break;
        case GreaterThan:
            queryBuilder = QueryBuilders.rangeQuery(leftExprStr).gt(rightExprStr);
            break;
        case LessThan:
            queryBuilder = QueryBuilders.rangeQuery(leftExprStr).lt(rightExprStr);
            break;
        case NotEqual:
            queryBuilder = QueryBuilders.boolQuery();
            TermQueryBuilder notEqCond = QueryBuilders.termQuery(leftExprStr, rightExprStr);
            ((BoolQueryBuilder) queryBuilder).mustNot(notEqCond);
            break;
        case RegExp: // 對應到ES中的正則查詢
            queryBuilder = QueryBuilders.boolQuery();
            RegexpQueryBuilder regCond = QueryBuilders.regexpQuery(leftExprStr, rightExprStr);
            ((BoolQueryBuilder) queryBuilder).mustNot(regCond);
            break;
        case NotRegExp:
            queryBuilder = QueryBuilders.boolQuery();
            RegexpQueryBuilder notRegCond = QueryBuilders.regexpQuery(leftExprStr, rightExprStr);
            ((BoolQueryBuilder) queryBuilder).mustNot(notRegCond);
            break;
        case Like:
            queryBuilder = QueryBuilders.boolQuery();
            MatchPhraseQueryBuilder likeCond = QueryBuilders.matchPhraseQuery(leftExprStr,
                    rightExprStr.replace("%", ""));
            ((BoolQueryBuilder) queryBuilder).must(likeCond);
            break;
        case NotLike:
            queryBuilder = QueryBuilders.boolQuery();
            MatchPhraseQueryBuilder notLikeCond = QueryBuilders.matchPhraseQuery(leftExprStr,
                    rightExprStr.replace("%", ""));
            ((BoolQueryBuilder) queryBuilder).mustNot(notLikeCond);
            break;
        default:
            throw new IllegalArgumentException("暫不支援該運算子!" + operator.toString());
        }
        return queryBuilder;
    }

 

    到這裡我們就完成了SQL轉ES DSL的功能了(其實只是簡單查詢的轉換),下面我們寫幾個Junit測試一下吧:

    首先是簡單的比較運算:

public void normalSQLTest() {
        String sql = "select * from test where time>= 1";
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
         try {
             searchSourceBuilder = new SqlParser(searchSourceBuilder).parse(sql);
        } catch (Exception e) {
            e.printStackTrace();
        }
         System.out.println(searchSourceBuilder);
         SearchSourceBuilder builderToCompare = new SearchSourceBuilder();
         QueryBuilder whereBuilder = QueryBuilders.rangeQuery("time").gte("1");
         BoolQueryBuilder briage = QueryBuilders.boolQuery();
         briage.must();
         briage.must(whereBuilder);
         builderToCompare.query(briage);
         assertEquals(searchSourceBuilder,builderToCompare);
    }

  下面是輸出的ES 查詢語句:

{
  "query" : {
    "bool" : {
      "must" : [
        {
          "range" : {
            "time" : {
              "from" : "1",
              "to" : null,
              "include_lower" : true,
              "include_upper" : true,
              "boost" : 1.0
            }
          }
        }
      ],
      "disable_coord" : false,
      "adjust_pure_negative" : true,
      "boost" : 1.0
    }
  }
}

  再來個帶排序的:

   

    @Test
    public void normalSQLWithOrderByTest() {
        String sql = "select * from test where time>= 1 order by time desc";
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
         try {
             searchSourceBuilder = new SqlParser(searchSourceBuilder).parse(sql);
        } catch (Exception e) {
            e.printStackTrace();
        }
         System.out.println(searchSourceBuilder);
         SearchSourceBuilder builderToCompare = new SearchSourceBuilder();
         QueryBuilder whereBuilder = QueryBuilders.rangeQuery("time").gte("1");
         BoolQueryBuilder briage = QueryBuilders.boolQuery();
         briage.must();
         briage.must(whereBuilder);
         builderToCompare.sort("time",SortOrder.DESC);
         builderToCompare.query(briage);
         assertEquals(searchSourceBuilder,builderToCompare);
    }

   between, in這些沒什麼區別,就不貼程式碼了,最後看看稍微複雜點兒,帶邏輯運算的查詢:

  

@Test
    public void sqlLogicTest() {
        String sql = "select * from test where raw_log not like"+"'%aaa' && b=1 or c=0";
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
         try {
             searchSourceBuilder = new SqlParser(searchSourceBuilder).parse(sql);
        } catch (Exception e) {
            e.printStackTrace();
        }
         System.out.println(searchSourceBuilder);
         SearchSourceBuilder builderToCompare = new SearchSourceBuilder();
         QueryBuilder builder =QueryBuilders.matchPhraseQuery("raw_log","aaa"); 
         
         BoolQueryBuilder briage1 = QueryBuilders.boolQuery();//raw log not like
         briage1.mustNot(builder);  
         
         BoolQueryBuilder briage2 = QueryBuilders.boolQuery();  //b=1
         briage2.must(QueryBuilders.termQuery("b","1"));
         
         BoolQueryBuilder briage3 = QueryBuilders.boolQuery();   // not like and b=1
         briage3.must(briage1).must(briage2);
         
         BoolQueryBuilder briage4 = QueryBuilders.boolQuery();    //c =0
         briage4.must(QueryBuilders.termQuery("c","0"));
         
         BoolQueryBuilder briage5 = QueryBuilders.boolQuery();  // not like and b =1 or c =0
         briage5.should(briage3).should(briage4);
         
         
         
         BoolQueryBuilder briage6 = QueryBuilders.boolQuery();
         briage6.must();
         briage6.must(briage5);
         builderToCompare.query(briage6);
         assertEquals(searchSourceBuilder,builderToCompare);
    }

 下面是生成的查詢語句:

   

{
  "query" : {
    "bool" : {
      "must" : [
        {
          "bool" : {
            "should" : [
              {
                "bool" : {
                  "must" : [
                    {
                      "bool" : {
                        "must_not" : [
                          {
                            "match_phrase" : {
                              "raw_log" : {
                                "query" : "aaa",
                                "slop" : 0,
                                "boost" : 1.0
                              }
                            }
                          }
                        ],
                        "disable_coord" : false,
                        "adjust_pure_negative" : true,
                        "boost" : 1.0
                      }
                    },
                    {
                      "bool" : {
                        "must" : [
                          {
                            "term" : {
                              "b" : {
                                "value" : "1",
                                "boost" : 1.0
                              }
                            }
                          }
                        ],
                        "disable_coord" : false,
                        "adjust_pure_negative" : true,
                        "boost" : 1.0
                      }
                    }
                  ],
                  "disable_coord" : false,
                  "adjust_pure_negative" : true,
                  "boost" : 1.0
                }
              },
              {
                "bool" : {
                  "must" : [
                    {
                      "term" : {
                        "c" : {
                          "value" : "0",
                          "boost" : 1.0
                        }
                      }
                    }
                  ],
                  "disable_coord" : false,
                  "adjust_pure_negative" : true,
                  "boost" : 1.0
                }
              }
            ],
            "disable_coord" : false,
            "adjust_pure_negative" : true,
            "boost" : 1.0
          }
        }
      ],
      "disable_coord" : false,
      "adjust_pure_negative" : true,
      "boost" : 1.0
    }
  }
}

 

     

   四.總結

     本篇文章主要講述瞭如何使用Druid實現SQL語句轉換ES DSL進行搜尋的功能,後續文章中會陸續完善這個功能,實現諸如聚合查詢,分頁查詢等功能。

&n