Java 嵌入 SPL 輕鬆實現資料分組
要在 Java 程式碼中實現類似 SQL 中的 GroupBy 分組聚合運算,是比較繁瑣的,通常先要宣告資料結構(Java 實體類),然後用 Java 集合進行迴圈遍歷,最後根據分組條件新增到某個子集合中。Java 8 有了 Lambda(stream)程式碼簡潔了許多,分組後往往還要跟著聚合操作,仍然需要單寫聚合函式 sum(),count(*),topN()等。這些還都是最常規的分組和聚合運算,遇到對位分組、列舉分組、多重分組等非常規分組加上其他聚集函式 (FIRST,LAST…),程式碼就變得非常冗長且不通用。如果能有一箇中間件專門負責這類計算,採用類似 SQL 指令碼做演算法描述,在 Java 中直接呼叫指令碼並返回結果集就好了。Java 版集算器和 SPL 指令碼,就是這樣的機制,下面舉例說明如何使用。
SPL 實現
duty.xlsx 檔案中儲存著每個人的加班記錄:
workday |
name |
2016-02-05 |
Ashley |
2016-02-08 |
Ashley |
2016-02-09 |
Ashley |
2016-02-10 |
Johnson |
2016-02-11 |
Johnson |
2016-02-12 |
Johnson |
2016-02-15 |
Ashley |
2016-02-16 |
Ashley |
2016-02-17 |
Ashley |
彙總每個人的值班天數:
A |
|
1 |
=file("/Users/test/duty.xlsx")[email protected]() |
2 |
=A1.groups(name;count(name):count) |
儲存指令碼檔案CountName.dfx(嵌入 Java 會用到)
取每個月、每個人、頭三天的加班記錄
A |
|
1 |
=file("/Users/test/duty.xlsx")[email protected]() |
2 |
=A1.group(month(workday):mon,name;~.top(3):top3) |
儲存指令碼檔案RecMonTop3.dfx(嵌入 Java 會用到)
Java 呼叫
SPL 嵌入到 Java 應用程式十分方便,通過 JDBC 呼叫儲存過程方法載入,用常規分組儲存的檔案CountName.dfx,示例呼叫如下:
... Connection con = null; Class.forName("com.esproc.jdbc.InternalDriver"); con= DriverManager.getConnection("jdbc:esproc:local://"); //呼叫儲存過程,其中CountName是dfx的檔名 st =(com. esproc.jdbc.InternalCStatement)con.prepareCall("call CountName()"); //執行儲存過程 st.execute(); //獲取結果集 ResultSet rs = st.getResultSet(); ...... Connection con = null; Class.forName("com.esproc.jdbc.InternalDriver"); con= DriverManager.getConnection("jdbc:esproc:local://"); //呼叫儲存過程,其中CountName是dfx的檔名 st =(com. esproc.jdbc.InternalCStatement)con.prepareCall("call CountName()"); //執行儲存過程 st.execute(); //獲取結果集 ResultSet rs = st.getResultSet(); ...
替換成 RecMonTop3.dfx 是同樣的道理,只需 call RecMonTop3() 即可,也可同時返回兩個結果集。這裡只用 Java 片段粗略解釋瞭如何嵌入 SPL,詳細步驟請參閱 Java 如何呼叫 SPL 指令碼 ,也非常簡單,不再贅述。同時,SPL 也支援 ODBC 驅動,整合到支援 ODBC 的語言,嵌入過程類似。
拓展節選
之前沒有相關的總結,其實關於資料分組,細分起來其實還有很多種,對位分組、列舉分組、多重分組…,在乾學院 SPL 官方論壇都有總結和示例,這裡節選其中兩種。
示例 1:按順序分別列出使用 Chinese、English、French 作為官方語言的國家數量
MySQL8: with t(name,ord) as (select 'Chinese',1 union all select 'English',2 union all select 'French',3) select t.name, count(countrycode) cnt from t left join world.countrylanguage s on t.name=s.language where s.isofficial='T' group by name,ord order by ord;MySQL8: with t(name,ord) as (select 'Chinese',1 union all select 'English',2 union all select 'French',3) select t.name, count(countrycode) cnt from t left join world.countrylanguage s on t.name=s.language where s.isofficial='T' group by name,ord order by ord;
注意:表的字符集和資料庫會話的字符集要保持一致。
(1) show variables like ’character_set_connection’檢視當前會話字符集
(2) show create table world.countrylanguage 查看錶的字符集
(3) set character_set_connection=[字符集] 更新當前會話字符集
集算器 SPL:
A |
|
1 |
=connect("mysql") |
2 |
[email protected]("select * from world.countrylanguage where isofficial='T'") |
3 |
[Chinese,English,French] |
4 |
[email protected](A3,Language) |
5 |
=A4.new(A3(#):name, ~.len():cnt) |
A1: 連線資料庫
A2: 查詢出所有官方語言的記錄
A3: 需要列出的語言
A4: 將所有記錄按 Language 對位到 A3 相應位置
A5: 構造以語言和使用此語言為官方語言的國家數量的序表
示例 2:按順序分別列出使用 Chinese、English、French 及其它語言作為官方語言的國家數量
MySQL8: with t(name,ord) as (select 'Chinese',1 union all select 'English',2 union all select 'French',3 union all select 'Other', 4), s(name, cnt) as ( select language, count(countrycode) cnt from world.countrylanguage s where s.isofficial='T' and language in ('Chinese','English','French') group by language union all select 'Other', count(distinct countrycode) cnt from world.countrylanguage s where isofficial='T' and language not in ('Chinese','English','French') ) select t.name, s.cnt from t left join s using (name) order by t.ord;MySQL8: with t(name,ord) as (select 'Chinese',1 union all select 'English',2 union all select 'French',3 union all select 'Other', 4), s(name, cnt) as ( select language, count(countrycode) cnt from world.countrylanguage s where s.isofficial='T' and language in ('Chinese','English','French') group by language union all select 'Other', count(distinct countrycode) cnt from world.countrylanguage s where isofficial='T' and language not in ('Chinese','English','French') ) select t.name, s.cnt from t left join s using (name) order by t.ord;
集算器 SPL:
A |
|
1 |
=connect("mysql") |
2 |
[email protected]("select * from world.countrylanguage where isofficial='T'") |
3 |
[Chinese,English,French,Other] |
4 |
[email protected](A3.to(3),Language) |
5 |
=A4.new(A3(#):name, if(#<=3,~.len(), ~.icount(CountryCode)):cnt) |
A4: 將所有記錄按 Language 對位到 A3.to(3) 相應位置,並追加一組用於存放不能對位的記錄
A5: 第 4 組計算不同 CountryCode 的數量
示例 1:按順序列出各型別城市的數量
MySQL8: with t as (select * from world.city where CountryCode='CHN'), segment(class,start,end) as (select 'tiny', 0, 200000 union all select 'small', 200000, 1000000 union all select 'medium', 1000000, 2000000 union all select 'big', 2000000, 100000000 ) select class, count(1) cnt from segment s join t on t.population>=s.start and t.population<s.end group by class, start order by start;MySQL8: with t as (select * from world.city where CountryCode='CHN'), segment(class,start,end) as (select 'tiny', 0, 200000 union all select 'small', 200000, 1000000 union all select 'medium', 1000000, 2000000 union all select 'big', 2000000, 100000000 ) select class, count(1) cnt from segment s join t on t.population>=s.start and t.population<s.end group by class, start order by start;
集算器 SPL:
A |
|
1 |
=connect("mysql") |
2 |
[email protected]("select * from world.city where CountryCode='CHN'") |
3 |
=${string([20,100,200,10000].(~*10000).("?<"/~))} |
4 |
[tiny,small,medium,big] |
5 |
=A2.enum(A3,Population) |
6 |
=A5.new(A4(#):class, ~.len():cnt) |
A3: ${…} 巨集替換,以大括號內表示式的結果作為新表示式進行計算,結果為序列 [“?<200000”,“?<1000000”,“?<2000000”,“?<100000000”]
A5: 針對 A2 中每條記錄,尋找 A3 中第 1 個成立的條件,並追加到對應的組中
示例 2:列出華東地區大型城市數量、其它地區大型城市數量、非大型城市數量
MySQL8: with t as (select * from world.city where CountryCode='CHN') select 'East&Big' class, count(*) cnt from t where population>=2000000 and district in ('Shanghai','Jiangshu', 'Shandong','Zhejiang','Anhui','Jiangxi') union all select 'Other&Big', count(*) from t where population>=2000000 and district not in ('Shanghai','Jiangshu','Shandong','Zhejiang','Anhui','Jiangxi') union all select 'Not Big', count(*) from t where population<2000000;MySQL8: with t as (select * from world.city where CountryCode='CHN') select 'East&Big' class, count(*) cnt from t where population>=2000000 and district in ('Shanghai','Jiangshu', 'Shandong','Zhejiang','Anhui','Jiangxi') union all select 'Other&Big', count(*) from t where population>=2000000 and district not in ('Shanghai','Jiangshu','Shandong','Zhejiang','Anhui','Jiangxi') union all select 'Not Big', count(*) from t where population<2000000;
集算器 SPL:
A |
|
1 |
=connect("mysql") |
2 |
[email protected]("select * from world.city where CountryCode='CHN'") |
3 |
[Shanghai,Jiangshu, Shandong,Zhejiang,Anhui,Jiangxi] |
4 |
[?(1)>=2000000 && A3.contain(?(2)), ?(1)>=2000000 && !A3.contain(?(2))] |
5 |
[East&Big,Other&Big, Not Big] |
6 |
[email protected](A4, [Population,District]) |
7 |
=A6.new(A5(#):class, A6(#).len():cnt) |
A5: [email protected] 將不滿足 A4 中所有條件的記錄存放到追加的最後一組中
示例 3:列出所有地區大型城市數量、華東地區大型城市數量、非大型城市數量
MySQL8: with t as (select * from world.city where CountryCode='CHN') select 'Big' class, count(*) cnt from t where population>=2000000 union all select 'East&Big' class, count(*) cnt from t where population>=2000000 and district in ('Shanghai','Jiangshu','Shandong','Zhejiang','Anhui','Jiangxi') union all select 'Not Big' class, count(*) cnt from t where population<2000000;MySQL8: with t as (select * from world.city where CountryCode='CHN') select 'Big' class, count(*) cnt from t where population>=2000000 union all select 'East&Big' class, count(*) cnt from t where population>=2000000 and district in ('Shanghai','Jiangshu','Shandong','Zhejiang','Anhui','Jiangxi') union all select 'Not Big' class, count(*) cnt from t where population<2000000;
集算器 SPL:
A |
|
1 |
=connect("mysql") |
2 |
[email protected]("select * from world.city where CountryCode='CHN'") |
3 |
[Shanghai,Jiangshu, Shandong,Zhejiang,Anhui,Jiangxi] |
4 |
[?(1)>=2000000, ?(1)>=2000000 && A3.contain(?(2))] |
5 |
[Big, East&Big, Not Big] |
6 |
[email protected](A4, [Population,District]) |
7 |
=A6.new(A5(#):class, A6(#).len():cnt) |
A6: 若 A2 中記錄滿足 A4 中多個條件時,[email protected] 會將其追加到對應的每個組中
優勢總結
有庫寫 SQL,沒庫寫 SPL
用 Java 程式直接彙總計算資料,還是比較累的,程式碼很長,並且不可複用,很多情況資料也不在資料庫裡,有了 SPL,就能像在 Java 中用 SQL 一樣了,十分方便。常用無憂,不花錢就能取得終身使用權的入門版
如果要分析的資料是一次性或臨時性的,潤乾集算器每個月都提供免費試用授權,可以迴圈免費使用。但要和 Java 應用程式整合起來部署到伺服器上長期使用,定期更換試用授權還是比較麻煩,潤乾提供了有終身使用權的入門版,解決了這個後顧之憂,獲得方式參考 如何免費使用潤乾集算器?技術文件和社群支援
官方提供的集算器技術文件本身就有很多現成的例子,常規問題從文件裡都能找到解決方法。如果獲得了入門版,不僅能夠使用 SPL 的常規功能,碰到任何問題都可以去乾學院上去諮詢,官方通過該社群對入門版使用者提供免費的技術支援。