Java 嵌入 SPL 輕鬆實現資料分組

阿新 • • 發佈：2018-12-10

要在 Java 程式碼中實現類似 SQL 中的 GroupBy 分組聚合運算，是比較繁瑣的，通常先要宣告資料結構（Java 實體類），然後用 Java 集合進行迴圈遍歷，最後根據分組條件新增到某個子集合中。Java 8 有了 Lambda（stream）程式碼簡潔了許多，分組後往往還要跟著聚合操作，仍然需要單寫聚合函式 sum(),count(*),topN()等。這些還都是最常規的分組和聚合運算，遇到對位分組、列舉分組、多重分組等非常規分組加上其他聚集函式 (FIRST，LAST…)，程式碼就變得非常冗長且不通用。如果能有一箇中間件專門負責這類計算，採用類似 SQL 指令碼做演算法描述，在 Java 中直接呼叫指令碼並返回結果集就好了。Java 版集算器和 SPL 指令碼，就是這樣的機制，下面舉例說明如何使用。

SPL 實現

常規分組

duty.xlsx 檔案中儲存著每個人的加班記錄:

workday	name
2016-02-05	Ashley
2016-02-08	Ashley
2016-02-09	Ashley
2016-02-10	Johnson
2016-02-11	Johnson
2016-02-12	Johnson
2016-02-15	Ashley
2016-02-16	Ashley
2016-02-17	Ashley

彙總每個人的值班天數：

	A
1	=file("/Users/test/duty.xlsx")[email protected]()
2	=A1.groups(name;count(name):count)

imagepng

儲存指令碼檔案CountName.dfx(嵌入 Java 會用到)

每組 TopN

取每個月、每個人、頭三天的加班記錄

	A
1	=file("/Users/test/duty.xlsx")[email protected]()
2	=A1.group(month(workday):mon,name;~.top(3):top3)

imagepng

儲存指令碼檔案RecMonTop3.dfx(嵌入 Java 會用到)

Java 呼叫

SPL 嵌入到 Java 應用程式十分方便，通過 JDBC 呼叫儲存過程方法載入，用常規分組儲存的檔案CountName.dfx，示例呼叫如下：

...  Connection con = null;  Class.forName("com.esproc.jdbc.InternalDriver");  con= DriverManager.getConnection("jdbc:esproc:local://"); //呼叫儲存過程，其中CountName是dfx的檔名  st =(com. esproc.jdbc.InternalCStatement)con.prepareCall("call CountName()");  //執行儲存過程  st.execute();  //獲取結果集  ResultSet rs = st.getResultSet(); ......  Connection con = null;  Class.forName("com.esproc.jdbc.InternalDriver");  con= DriverManager.getConnection("jdbc:esproc:local://"); //呼叫儲存過程，其中CountName是dfx的檔名  st =(com. esproc.jdbc.InternalCStatement)con.prepareCall("call CountName()");  //執行儲存過程  st.execute();  //獲取結果集  ResultSet rs = st.getResultSet(); ...

替換成 RecMonTop3.dfx 是同樣的道理，只需 call RecMonTop3() 即可，也可同時返回兩個結果集。這裡只用 Java 片段粗略解釋瞭如何嵌入 SPL，詳細步驟請參閱 Java 如何呼叫 SPL 指令碼，也非常簡單，不再贅述。同時，SPL 也支援 ODBC 驅動，整合到支援 ODBC 的語言，嵌入過程類似。

拓展節選

之前沒有相關的總結，其實關於資料分組，細分起來其實還有很多種，對位分組、列舉分組、多重分組…，在乾學院 SPL 官方論壇都有總結和示例，這裡節選其中兩種。

SPL 對位分組

示例 1：按順序分別列出使用 Chinese、English、French 作為官方語言的國家數量

MySQL8: with t(name,ord) as (select 'Chinese',1 union all select 'English',2 union all select 'French',3) select t.name, count(countrycode) cnt from t left join world.countrylanguage s on t.name=s.language where s.isofficial='T' group by name,ord order by ord;MySQL8: with t(name,ord) as (select 'Chinese',1 union all select 'English',2 union all select 'French',3) select t.name, count(countrycode) cnt from t left join world.countrylanguage s on t.name=s.language where s.isofficial='T' group by name,ord order by ord;

注意：表的字符集和資料庫會話的字符集要保持一致。

(1) show variables like ’character_set_connection’檢視當前會話字符集

(2) show create table world.countrylanguage 查看錶的字符集

(3) set character_set_connection=[字符集] 更新當前會話字符集

集算器 SPL:

	A
1	=connect("mysql")
2	[email protected]("select * from world.countrylanguage where isofficial='T'")
3	[Chinese,English,French]
4	[email protected](A3,Language)
5	=A4.new(A3(#):name, ~.len():cnt)

A1: 連線資料庫

A2: 查詢出所有官方語言的記錄

A3: 需要列出的語言

A4: 將所有記錄按 Language 對位到 A3 相應位置

A5: 構造以語言和使用此語言為官方語言的國家數量的序表

示例 2：按順序分別列出使用 Chinese、English、French 及其它語言作為官方語言的國家數量

MySQL8: with t(name,ord) as (select 'Chinese',1 union all select 'English',2 union all select 'French',3 union all select 'Other', 4), s(name, cnt) as ( select language, count(countrycode) cnt from world.countrylanguage s where s.isofficial='T' and language in ('Chinese','English','French') group by language union all select 'Other', count(distinct countrycode) cnt from world.countrylanguage s where isofficial='T' and language not in ('Chinese','English','French') ) select t.name, s.cnt from t left join s using (name) order by t.ord;MySQL8: with t(name,ord) as (select 'Chinese',1 union all select 'English',2 union all select 'French',3 union all select 'Other', 4), s(name, cnt) as ( select language, count(countrycode) cnt from world.countrylanguage s where s.isofficial='T' and language in ('Chinese','English','French') group by language union all select 'Other', count(distinct countrycode) cnt from world.countrylanguage s where isofficial='T' and language not in ('Chinese','English','French') ) select t.name, s.cnt from t left join s using (name) order by t.ord;

集算器 SPL:

	A
1	=connect("mysql")
2	[email protected]("select * from world.countrylanguage where isofficial='T'")
3	[Chinese,English,French,Other]
4	[email protected](A3.to(3),Language)
5	=A4.new(A3(#):name, if(#<=3,~.len(), ~.icount(CountryCode)):cnt)

A4: 將所有記錄按 Language 對位到 A3.to(3) 相應位置，並追加一組用於存放不能對位的記錄

A5: 第 4 組計算不同 CountryCode 的數量

SPL 列舉分組

示例 1：按順序列出各型別城市的數量

MySQL8: with t as (select * from world.city where CountryCode='CHN'), segment(class,start,end) as (select 'tiny', 0, 200000 union all select 'small',  200000, 1000000 union all select 'medium', 1000000, 2000000 union all select 'big', 2000000, 100000000 ) select class, count(1) cnt from segment s join t on t.population>=s.start and t.population<s.end group by class, start order by start;MySQL8: with t as (select * from world.city where CountryCode='CHN'), segment(class,start,end) as (select 'tiny', 0, 200000 union all select 'small',  200000, 1000000 union all select 'medium', 1000000, 2000000 union all select 'big', 2000000, 100000000 ) select class, count(1) cnt from segment s join t on t.population>=s.start and t.population<s.end group by class, start order by start;

集算器 SPL:

	A
1	=connect("mysql")
2	[email protected]("select * from world.city where CountryCode='CHN'")
3	=${string([20,100,200,10000].(~*10000).("?<"/~))}
4	[tiny,small,medium,big]
5	=A2.enum(A3,Population)
6	=A5.new(A4(#):class, ~.len():cnt)

A3: ${…} 巨集替換，以大括號內表示式的結果作為新表示式進行計算，結果為序列 [“?<200000”,“?<1000000”,“?<2000000”,“?<100000000”]

A5: 針對 A2 中每條記錄，尋找 A3 中第 1 個成立的條件，並追加到對應的組中

示例 2：列出華東地區大型城市數量、其它地區大型城市數量、非大型城市數量

MySQL8: with t as (select * from world.city where CountryCode='CHN') select 'East&Big' class, count(*) cnt from t where population>=2000000 and district in ('Shanghai','Jiangshu', 'Shandong','Zhejiang','Anhui','Jiangxi') union all select 'Other&Big', count(*) from t where population>=2000000 and district not in ('Shanghai','Jiangshu','Shandong','Zhejiang','Anhui','Jiangxi') union all select 'Not Big', count(*) from t where population<2000000;MySQL8: with t as (select * from world.city where CountryCode='CHN') select 'East&Big' class, count(*) cnt from t where population>=2000000 and district in ('Shanghai','Jiangshu', 'Shandong','Zhejiang','Anhui','Jiangxi') union all select 'Other&Big', count(*) from t where population>=2000000 and district not in ('Shanghai','Jiangshu','Shandong','Zhejiang','Anhui','Jiangxi') union all select 'Not Big', count(*) from t where population<2000000;

集算器 SPL:

	A
1	=connect("mysql")
2	[email protected]("select * from world.city where CountryCode='CHN'")
3	[Shanghai,Jiangshu, Shandong,Zhejiang,Anhui,Jiangxi]
4	[?(1)>=2000000 && A3.contain(?(2)), ?(1)>=2000000 && !A3.contain(?(2))]
5	[East&Big,Other&Big, Not Big]
6	[email protected](A4, [Population,District])
7	=A6.new(A5(#):class, A6(#).len():cnt)

A5: [email protected] 將不滿足 A4 中所有條件的記錄存放到追加的最後一組中

示例 3：列出所有地區大型城市數量、華東地區大型城市數量、非大型城市數量

MySQL8: with t as (select * from world.city where CountryCode='CHN') select 'Big' class, count(*) cnt from t where population>=2000000 union all select 'East&Big' class, count(*) cnt from t where population>=2000000 and district in ('Shanghai','Jiangshu','Shandong','Zhejiang','Anhui','Jiangxi') union all select 'Not Big' class, count(*) cnt from t where population<2000000;MySQL8: with t as (select * from world.city where CountryCode='CHN') select 'Big' class, count(*) cnt from t where population>=2000000 union all select 'East&Big' class, count(*) cnt from t where population>=2000000 and district in ('Shanghai','Jiangshu','Shandong','Zhejiang','Anhui','Jiangxi') union all select 'Not Big' class, count(*) cnt from t where population<2000000;

集算器 SPL:

	A
1	=connect("mysql")
2	[email protected]("select * from world.city where CountryCode='CHN'")
3	[Shanghai,Jiangshu, Shandong,Zhejiang,Anhui,Jiangxi]
4	[?(1)>=2000000, ?(1)>=2000000 && A3.contain(?(2))]
5	[Big, East&Big, Not Big]
6	[email protected](A4, [Population,District])
7	=A6.new(A5(#):class, A6(#).len():cnt)

A6: 若 A2 中記錄滿足 A4 中多個條件時，[email protected] 會將其追加到對應的每個組中

優勢總結

有庫寫 SQL，沒庫寫 SPL
用 Java 程式直接彙總計算資料，還是比較累的，程式碼很長，並且不可複用，很多情況資料也不在資料庫裡，有了 SPL，就能像在 Java 中用 SQL 一樣了，十分方便。
常用無憂，不花錢就能取得終身使用權的入門版
如果要分析的資料是一次性或臨時性的，潤乾集算器每個月都提供免費試用授權，可以迴圈免費使用。但要和 Java 應用程式整合起來部署到伺服器上長期使用，定期更換試用授權還是比較麻煩，潤乾提供了有終身使用權的入門版，解決了這個後顧之憂，獲得方式參考如何免費使用潤乾集算器？
技術文件和社群支援
官方提供的集算器技術文件本身就有很多現成的例子，常規問題從文件裡都能找到解決方法。如果獲得了入門版，不僅能夠使用 SPL 的常規功能，碰到任何問題都可以去乾學院上去諮詢，官方通過該社群對入門版使用者提供免費的技術支援。

Java 嵌入 SPL 輕鬆實現資料分組

SPL 實現

常規分組

每組 TopN

Java 呼叫

拓展節選

SPL 對位分組

SPL 列舉分組

優勢總結

Java 嵌入 SPL 輕鬆實現資料分組

Java 嵌入 SPL 輕鬆實現 Excel 檔案合併

java使用poi包實現資料匯出

第11篇 java 8----用流收集資料 -----分組

java 三層架構實現資料的顯示和分頁功能

用java也可以輕鬆實現收集系統資訊：Sigar介紹

Java實現資料統計的常用演算法

Java實現GroupBy/分組TopN功能

Java---之實現資料結構----佇列

java之---實現資料結構--棧(基本版)

【Java】歸併排序的非遞迴實現資料結構與演算法合集資料結構與演算法合集

使用者登入使用java的IO流實現將資料儲存到data目錄下

Java實現資料夾的複製

Java基於httpclient獲取網頁資料，實現簡單網路爬蟲

MQTT Java客戶端Eclipse paho實現資料的傳送和接收

Java位元組流實現資料夾的拷貝

SpringBoot利用java反射機制，實現靈活讀取Excel表格中的資料和匯出資料至Excel表格

用Java+xml配置方式實現Spring資料事務（程式設計式事務）

使用java自造TCP/IP協議棧：使用JPCAP實現資料發包

Python資料相關係數矩陣和熱力圖輕鬆實現

Java 嵌入 SPL 輕鬆實現資料分組

SPL 實現

常規分組

每組 TopN

Java 呼叫

拓展節選

SPL 對位分組

SPL 列舉分組

優勢總結

相關推薦