1. 程式人生 > >pg資料庫查詢重複資料並可識別空資料列重複

pg資料庫查詢重複資料並可識別空資料列重複

根據多個欄位查詢重複資料:SELECT A,B,C FROM TABLE WHERE CONDITION GROUP BY A,B,C HAVING COUNT(*)>1 即可,但是現在的需求是:

最終查詢的欄位多於分組欄位,且同一欄位的空值也視為重複。在網上查詢了很多資料,也詢問了同事最後嘗試出如下sql:

SELECT A,B,C,D,E FROM TABLE A WHERE EXISTS(SELECT A,B,C FROM TABLE B WHERE CONDITION AND COALESCE(A.A,'0')=COALESCE(B.A,'0') AND COALESCE(A.B,'0')=COALESCE(B.B,'0') AND COALESCE(A.C,'0')=COALESCE(B.C,'0') GROUP BY A,B,C HAVING COUNT(*)>1);

注意:上述sql中coalesce()函式中的後一個值是自己設定的,但設定的值的型別要與前一個值的型別相同。


如果要處理相同條件下查詢出的資料,可使用如下sql:

DELETE FROM TABLE WHERE ID NOT IN(SELECT ID FROM

(SELECT MIN(ID) ID,A,B,C FROM TABLE WHERE CONDITION GROUP BY A,B,C HAVING COUNT(*)>1) C) 

AND ID IN(SELECT ID FROM TABLE A WHERE EXISTS 

(SELECT A,B,C FROM TABLE B WHERE CONDITION AND COALESCE(A.A,'0')=COALESCE(B.A,'0') AND COALESCE(A.B,'0')=COALESCE(B.B,'0') AND COALESCE(A.C,'0')=COALESCE(B.C,'0') GROUP BY A,B,C HAVING COUNT(*)>1))

這裡涉及到IN 與EXISTS,NOT IN與NOT EXISTS的區別,有興趣的同學可以查一查。

雖然能實現查重及去重功能,但是在大資料量時模型會執行特別慢,和資料庫也有一定關係。