pg資料庫查詢重複資料並可識別空資料列重複
根據多個欄位查詢重複資料:SELECT A,B,C FROM TABLE WHERE CONDITION GROUP BY A,B,C HAVING COUNT(*)>1 即可,但是現在的需求是:
最終查詢的欄位多於分組欄位,且同一欄位的空值也視為重複。在網上查詢了很多資料,也詢問了同事最後嘗試出如下sql:
SELECT A,B,C,D,E FROM TABLE A WHERE EXISTS(SELECT A,B,C FROM TABLE B WHERE CONDITION AND COALESCE(A.A,'0')=COALESCE(B.A,'0') AND COALESCE(A.B,'0')=COALESCE(B.B,'0') AND COALESCE(A.C,'0')=COALESCE(B.C,'0') GROUP BY A,B,C HAVING COUNT(*)>1);
注意:上述sql中coalesce()函式中的後一個值是自己設定的,但設定的值的型別要與前一個值的型別相同。
如果要處理相同條件下查詢出的資料,可使用如下sql:
DELETE FROM TABLE WHERE ID NOT IN(SELECT ID FROM
(SELECT MIN(ID) ID,A,B,C FROM TABLE WHERE CONDITION GROUP BY A,B,C HAVING COUNT(*)>1) C)
AND ID IN(SELECT ID FROM TABLE A WHERE EXISTS
(SELECT A,B,C FROM TABLE B WHERE CONDITION AND COALESCE(A.A,'0')=COALESCE(B.A,'0') AND COALESCE(A.B,'0')=COALESCE(B.B,'0') AND COALESCE(A.C,'0')=COALESCE(B.C,'0') GROUP BY A,B,C HAVING COUNT(*)>1))
這裡涉及到IN 與EXISTS,NOT IN與NOT EXISTS的區別,有興趣的同學可以查一查。
雖然能實現查重及去重功能,但是在大資料量時模型會執行特別慢,和資料庫也有一定關係。