使用painless將ElasticSearch字串拆分為陣列
阿新 • • 發佈:2020-11-02
# 一、實現場景:
ES字串型別欄位imgs,有些歷史資料是用逗號分隔的字串,需要將歷史資料拆分為陣列形式。
# 示例:
## 1.構造測試資料:
建立索引並推送幾條典型的歷史資料,涵蓋以下幾種情況:
* 逗號分隔字串;
* 陣列型別;
* 長度為0的字串;
* 空陣列。
```
PUT test_cj/test/id_1
{
"imgs": "https://img2.autoimg.cn/hscdfs/g27/M08/C8/C9/autohomecar__ChcCQF2tFp-AVbd1AABUAEDjxME398.jpg,https://img2.autoimg.cn/hscdfs/g27/M00/C5/41/autohomecar__ChsEfF2tFp-AUNE9AABAAMdcvmc812.jpg,https://img2.autoimg.cn/hscdfs/g27/M06/C5/41/autohomecar__ChsEfF2tFp-AaGesAABUABSmyrM852.jpg"
}
PUT test_cj/test/id_2
{
"imgs": [
"https://img2.autoimg.cn/hscdfs/g1/M08/83/34/autohomecar__ChcCQ1wGPV6AMsb0AAD8AKsOcww068.jpg",
"https://img2.autoimg.cn/hscdfs/g1/M03/B4/5D/autohomecar__ChsEmVwGPV-AQmnZAADMAMSUUHU068.jpg",
"https://img2.autoimg.cn/hscdfs/g1/M00/83/34/autohomecar__ChcCQ1wGPV-ABZk0AACcAItlOsc793.jpg",
"https://img2.autoimg.cn/hscdfs/g1/M07/B3/D1/autohomecar__ChsEj1wGPV-APTZEAABcACQZNGk338.jpg",
"https://img2.autoimg.cn/hscdfs/g1/M0B/83/34/autohomecar__ChcCQ1wGPV-ASLK_AACgAO-S6mU461.jpg"
]
}
PUT test_cj/test/id_3
{
"imgs": ""
}
PUT test_cj/test/id_4
{
"imgs": []
}
```
## 2.確認一下資料。
```
GET test_cj/_search
```
```
[
{
"_index" : "test_cj",
"_type" : "test",
"_id" : "id_1",
"_score" : 1.0,
"_source" : {
"imgs" : "https://img2.autoimg.cn/hscdfs/g27/M08/C8/C9/autohomecar__ChcCQF2tFp-AVbd1AABUAEDjxME398.jpg,https://img2.autoimg.cn/hscdfs/g27/M00/C5/41/autohomecar__ChsEfF2tFp-AUNE9AABAAMdcvmc812.jpg,https://img2.autoimg.cn/hscdfs/g27/M06/C5/41/autohomecar__ChsEfF2tFp-AaGesAABUABSmyrM852.jpg"
}
},
{
"_index" : "test_cj",
"_type" : "test",
"_id" : "id_2",
"_score" : 1.0,
"_source" : {
"imgs" : [
"https://img2.autoimg.cn/hscdfs/g1/M08/83/34/autohomecar__ChcCQ1wGPV6AMsb0AAD8AKsOcww068.jpg",
"https://img2.autoimg.cn/hscdfs/g1/M03/B4/5D/autohomecar__ChsEmVwGPV-AQmnZAADMAMSUUHU068.jpg",
"https://img2.autoimg.cn/hscdfs/g1/M00/83/34/autohomecar__ChcCQ1wGPV-ABZk0AACcAItlOsc793.jpg",
"https://img2.autoimg.cn/hscdfs/g1/M07/B3/D1/autohomecar__ChsEj1wGPV-APTZEAABcACQZNGk338.jpg",
"https://img2.autoimg.cn/hscdfs/g1/M0B/83/34/autohomecar__ChcCQ1wGPV-ASLK_AACgAO-S6mU461.jpg"
]
}
},
{
"_index" : "test_cj",
"_type" : "test",
"_id" : "id_3",
"_score" : 1.0,
"_source" : {
"imgs" : ""
}
},
{
"_index" : "test_cj",
"_type" : "test",
"_id" : "id_4",
"_score" : 1.0,
"_source" : {
"imgs" : [ ]
}
}
]
```
## 3.執行painless指令碼
使用painless指令碼更新歷史資料。有幾點需要注意:
* 只更新符合某些條件的資料,可以使用_update_by_query操作,這個例子比較簡單沒有設定query語句。
* 執行過程中衝突處理方式,這裡使用的是conflicts=proceed,表示繼續執行;
* painless檢測物件型別使用關鍵字instanceof;
* painless指令碼拆分字串,想避免使用正則表示式,而是選用了StringTokenizer實現。
```
POST test_cj/_update_by_query?conflicts=proceed
{
"script": {
"source": """
if(ctx._source['imgs'] instanceof String){
String s=ctx._source['imgs'];
ArrayList array=new ArrayList();
if(!s.isEmpty()){
String splitter = ",";
StringTokenizer tokenValue = new StringTokenizer(s, splitter);
while (tokenValue.hasMoreTokens()) {
array.add(tokenValue.nextToken());
}
}
ctx._source.imgs=array;
}
"""
}
}
```
4.如果更新資料量較大,需要執行一段時間,期間檢視執行進度:
```
GET _tasks?detailed=true&actions=*byquery
```
5.檢視執行結果。
```
GET test_cj/_search
```
```
[
{
"_index" : "test_cj",
"_type" : "test",
"_id" : "id_1",
"_score" : 1.0,
"_source" : {
"imgs" : [
"https://img2.autoimg.cn/hscdfs/g27/M08/C8/C9/autohomecar__ChcCQF2tFp-AVbd1AABUAEDjxME398.jpg",
"https://img2.autoimg.cn/hscdfs/g27/M00/C5/41/autohomecar__ChsEfF2tFp-AUNE9AABAAMdcvmc812.jpg",
"https://img2.autoimg.cn/hscdfs/g27/M06/C5/41/autohomecar__ChsEfF2tFp-AaGesAABUABSmyrM852.jpg"
]
}
},
{
"_index" : "test_cj",
"_type" : "test",
"_id" : "id_2",
"_score" : 1.0,
"_source" : {
"imgs" : [
"https://img2.autoimg.cn/hscdfs/g1/M08/83/34/autohomecar__ChcCQ1wGPV6AMsb0AAD8AKsOcww068.jpg",
"https://img2.autoimg.cn/hscdfs/g1/M03/B4/5D/autohomecar__ChsEmVwGPV-AQmnZAADMAMSUUHU068.jpg",
"https://img2.autoimg.cn/hscdfs/g1/M00/83/34/autohomecar__ChcCQ1wGPV-ABZk0AACcAItlOsc793.jpg",
"https://img2.autoimg.cn/hscdfs/g1/M07/B3/D1/autohomecar__ChsEj1wGPV-APTZEAABcACQZNGk338.jpg",
"https://img2.autoimg.cn/hscdfs/g1/M0B/83/34/autohomecar__ChcCQ1wGPV-ASLK_AACgAO-S6mU461.jpg"
]
}
},
{
"_index" : "test_cj",
"_type" : "test",
"_id" : "id_3",
"_score" : 1.0,
"_source" : {
"imgs" : [ ]
}
},
{
"_index" : "test_cj",
"_type" : "test",
"_id" : "id_4",
"_score" : 1.0,
"_source" : {
"imgs" : [ ]
}
}