行集與集合操作(Scope)
main~
假設,我們有以下的兩個行集:
A
id:int Name
1 Smith
1 Smith
2 Brown
3 Case
B
id:int Name
1 Smith
1 Smith
1 Smith
2 Brown
4 Dey
4 Dey
我們載入這些資料集:
a = EXTRACT
Id:int, Name:string
FROM @"/my/SampleData/SetOps_A.txt"
USING DefaultTextExtractor();
b = EXTRACT
Id:int, Name:string
FROM @"/my/SampleData/SetOps_B.txt"
USING DefaultTextExtractor();
例項(行集的合併):
存在兩種,一種為UNION ALL(保留所有),一種為UNION(刪除重複)
union_distinct = SELECT * FROM a
UNION DISTINCT
SELECT * FROM b;
OUTPUT union_distinct
TO @"/my/Outputs/union_distinct.txt";
結果:
UNION DISTINCT
id:int Name
1 Smith
2 Brown
3 Case
4 Dey
union_all = SELECT * FROM a
UNION ALL
SELECT * FROM b;
OUTPUT union_all
TO @"/my/Outputs/union_all.txt";
結果:
UNION ALL
id:int Name
1 Smith
1 Smith
2 Brown
3 Case
1 Smith
1 Smith
1 Smith
2 Brown
4 Dey
4 Dey
例項(集合的公共行)
可以使用INTERSECT來實現。
同樣的存在兩種,INTERSECT ALL(保留所有),INTERSECT(刪除重複)。
rs1 = SELECT * FROM a
INTERSECT DISTINCT
SELECT * FROM b;
rs2 = SELECT * FROM a
INTERSECT ALL
SELECT * FROM b;
OUTPUT rs1
TO @"/my/Outputs/intersect.txt";
OUTPUT rs2
TO @"/my/Outputs/intersect-all.txt";
結果:
INTERSECT DISTINCT
id:int name
1 Smith
2 Brown
INTERSECT ALL
id:int name
1 Smith
1 Smith
2 Brown
例項(查詢在左rowset中,不在右rowset中的所有行)
可以使用EXCEPT實現,同樣的存在兩種,EXCEPT ALL(保留全部),EXCEPT (刪除重複)。
rs0 = SELECT * FROM a
EXCEPT DISTINCT
SELECT * FROM b;
rs1 = SELECT * FROM a
EXCEPT ALL
SELECT * FROM b;
rs2 = SELECT * FROM b
EXCEPT DISTINCT
SELECT * FROM a;
rs3 = SELECT * FROM b
EXCEPT ALL
SELECT * FROM a;
OUTPUT rs0 TO @"/my/Outputs/except_distinct_a_b.txt";
OUTPUT rs1 TO @"/my/Outputs/except-all_a_b.txt";
OUTPUT rs2 TO @"/my/Outputs/except_distinct_b_a.txt";
OUTPUT rs3 TO @"/my/Outputs/except-all_b_a.txt";
結果:
EXCEPT ALL (A,B)
id:int name
3 Case
EXCEPT DISTINCT (A,B)
id:int name
3 Case
EXCEPT ALL (B,A)
id:int name
1 Smith
4 Dey
4 Dey
EXCEPT DISTINCT (B,A)
id:int name
4 Dey