1. 程式人生 > >使用painless將ElasticSearch字串拆分為陣列

使用painless將ElasticSearch字串拆分為陣列

# 一、實現場景: ES字串型別欄位imgs,有些歷史資料是用逗號分隔的字串,需要將歷史資料拆分為陣列形式。 # 示例: ## 1.構造測試資料: 建立索引並推送幾條典型的歷史資料,涵蓋以下幾種情況: * 逗號分隔字串; * 陣列型別; * 長度為0的字串; * 空陣列。 ``` PUT test_cj/test/id_1 { "imgs": "https://img2.autoimg.cn/hscdfs/g27/M08/C8/C9/autohomecar__ChcCQF2tFp-AVbd1AABUAEDjxME398.jpg,https://img2.autoimg.cn/hscdfs/g27/M00/C5/41/autohomecar__ChsEfF2tFp-AUNE9AABAAMdcvmc812.jpg,https://img2.autoimg.cn/hscdfs/g27/M06/C5/41/autohomecar__ChsEfF2tFp-AaGesAABUABSmyrM852.jpg" } PUT test_cj/test/id_2 { "imgs": [ "https://img2.autoimg.cn/hscdfs/g1/M08/83/34/autohomecar__ChcCQ1wGPV6AMsb0AAD8AKsOcww068.jpg", "https://img2.autoimg.cn/hscdfs/g1/M03/B4/5D/autohomecar__ChsEmVwGPV-AQmnZAADMAMSUUHU068.jpg", "https://img2.autoimg.cn/hscdfs/g1/M00/83/34/autohomecar__ChcCQ1wGPV-ABZk0AACcAItlOsc793.jpg", "https://img2.autoimg.cn/hscdfs/g1/M07/B3/D1/autohomecar__ChsEj1wGPV-APTZEAABcACQZNGk338.jpg", "https://img2.autoimg.cn/hscdfs/g1/M0B/83/34/autohomecar__ChcCQ1wGPV-ASLK_AACgAO-S6mU461.jpg" ] } PUT test_cj/test/id_3 { "imgs": "" } PUT test_cj/test/id_4 { "imgs": [] } ``` ## 2.確認一下資料。 ``` GET test_cj/_search ``` ``` [ { "_index" : "test_cj", "_type" : "test", "_id" : "id_1", "_score" : 1.0, "_source" : { "imgs" : "https://img2.autoimg.cn/hscdfs/g27/M08/C8/C9/autohomecar__ChcCQF2tFp-AVbd1AABUAEDjxME398.jpg,https://img2.autoimg.cn/hscdfs/g27/M00/C5/41/autohomecar__ChsEfF2tFp-AUNE9AABAAMdcvmc812.jpg,https://img2.autoimg.cn/hscdfs/g27/M06/C5/41/autohomecar__ChsEfF2tFp-AaGesAABUABSmyrM852.jpg" } }, { "_index" : "test_cj", "_type" : "test", "_id" : "id_2", "_score" : 1.0, "_source" : { "imgs" : [ "https://img2.autoimg.cn/hscdfs/g1/M08/83/34/autohomecar__ChcCQ1wGPV6AMsb0AAD8AKsOcww068.jpg", "https://img2.autoimg.cn/hscdfs/g1/M03/B4/5D/autohomecar__ChsEmVwGPV-AQmnZAADMAMSUUHU068.jpg", "https://img2.autoimg.cn/hscdfs/g1/M00/83/34/autohomecar__ChcCQ1wGPV-ABZk0AACcAItlOsc793.jpg", "https://img2.autoimg.cn/hscdfs/g1/M07/B3/D1/autohomecar__ChsEj1wGPV-APTZEAABcACQZNGk338.jpg", "https://img2.autoimg.cn/hscdfs/g1/M0B/83/34/autohomecar__ChcCQ1wGPV-ASLK_AACgAO-S6mU461.jpg" ] } }, { "_index" : "test_cj", "_type" : "test", "_id" : "id_3", "_score" : 1.0, "_source" : { "imgs" : "" } }, { "_index" : "test_cj", "_type" : "test", "_id" : "id_4", "_score" : 1.0, "_source" : { "imgs" : [ ] } } ] ``` ## 3.執行painless指令碼 使用painless指令碼更新歷史資料。有幾點需要注意: * 只更新符合某些條件的資料,可以使用_update_by_query操作,這個例子比較簡單沒有設定query語句。 * 執行過程中衝突處理方式,這裡使用的是conflicts=proceed,表示繼續執行; * painless檢測物件型別使用關鍵字instanceof; * painless指令碼拆分字串,想避免使用正則表示式,而是選用了StringTokenizer實現。 ``` POST test_cj/_update_by_query?conflicts=proceed { "script": { "source": """ if(ctx._source['imgs'] instanceof String){ String s=ctx._source['imgs']; ArrayList array=new ArrayList(); if(!s.isEmpty()){ String splitter = ","; StringTokenizer tokenValue = new StringTokenizer(s, splitter); while (tokenValue.hasMoreTokens()) { array.add(tokenValue.nextToken()); } } ctx._source.imgs=array; } """ } } ``` 4.如果更新資料量較大,需要執行一段時間,期間檢視執行進度: ``` GET _tasks?detailed=true&actions=*byquery ``` 5.檢視執行結果。 ``` GET test_cj/_search ``` ``` [ { "_index" : "test_cj", "_type" : "test", "_id" : "id_1", "_score" : 1.0, "_source" : { "imgs" : [ "https://img2.autoimg.cn/hscdfs/g27/M08/C8/C9/autohomecar__ChcCQF2tFp-AVbd1AABUAEDjxME398.jpg", "https://img2.autoimg.cn/hscdfs/g27/M00/C5/41/autohomecar__ChsEfF2tFp-AUNE9AABAAMdcvmc812.jpg", "https://img2.autoimg.cn/hscdfs/g27/M06/C5/41/autohomecar__ChsEfF2tFp-AaGesAABUABSmyrM852.jpg" ] } }, { "_index" : "test_cj", "_type" : "test", "_id" : "id_2", "_score" : 1.0, "_source" : { "imgs" : [ "https://img2.autoimg.cn/hscdfs/g1/M08/83/34/autohomecar__ChcCQ1wGPV6AMsb0AAD8AKsOcww068.jpg", "https://img2.autoimg.cn/hscdfs/g1/M03/B4/5D/autohomecar__ChsEmVwGPV-AQmnZAADMAMSUUHU068.jpg", "https://img2.autoimg.cn/hscdfs/g1/M00/83/34/autohomecar__ChcCQ1wGPV-ABZk0AACcAItlOsc793.jpg", "https://img2.autoimg.cn/hscdfs/g1/M07/B3/D1/autohomecar__ChsEj1wGPV-APTZEAABcACQZNGk338.jpg", "https://img2.autoimg.cn/hscdfs/g1/M0B/83/34/autohomecar__ChcCQ1wGPV-ASLK_AACgAO-S6mU461.jpg" ] } }, { "_index" : "test_cj", "_type" : "test", "_id" : "id_3", "_score" : 1.0, "_source" : { "imgs" : [ ] } }, { "_index" : "test_cj", "_type" : "test", "_id" : "id_4", "_score" : 1.0, "_source" : { "imgs" : [ ] } }