Oracle刪除重復數據
阿新 • • 發佈:2018-07-17
bi報表 oracle 除了 技術 bsp select HERE recycle group
這條dml語句就是噩夢,因為有"not in" 如果你的數據量大,請慎用。
3)也就是經過實踐的方法,效率還可以,大概5分鐘就刪除了。步奏如下:
1.查詢表中的重復數據
select * from 表1 a where (a.Id,a.seq) in(select Id,seq from 表1 group by Id,seq having count(*)> 1) (a.Id,a.seq 是有重復的主鍵)
2.建一張表
create table lsb as select * from 表1 a where (a.Id,a.seq) in(select Id,seq from 表1 group by Id,seq having count(*)> 1); commit ;(這樣lsb的表結構就和表1的表結構一樣)
3.刪除表1裏的重復數據
delete from 表1 a where (a.Id,a.seq) in(select Id,seq from 表1 group by Id,seq having count(*)> 1) ;
commit;
4.查詢lsb表中的rowid最小的數據
select * from lsb a where a.rowid in(select min(rowid) from lsb group by Id,seq having count(*)> 1)
5.把查詢出來的rowid插入到表1裏
insert into 表1 select * from lsb a where a.rowid in(select min(rowid) from lsb group by Id,seq having count(*)> 1) ;
commit;
6.drop table lsb;
4)整體步奏
create table lsb as select * from 表1 a where (a.Id,a.seq) in(select Id,seq from 表1 group by Id,seq having count(*)> 1); --也可以是臨時表效率更高(不需要寫磁盤)
commit ;
delete from 表1 a where (a.Id,a.seq) in(select Id,seq from 表1 group by Id,seq having count(*)> 1) ;
commit;
insert into 表1 select * from lsb a where a.rowid in(select min(rowid) from lsb group by Id,seq having count(*)> 1) ;
commit;
drop table lsb;
背景:有兩個數據庫(源數據庫,和目標數據庫),每天把源數據庫了數據同步到目標數據庫中,由於各種原因,怕數據丟失,所有同步8天前後的數據(有主鍵,不要擔心重復,每天十幾萬條,表中已經有6千萬條),但是不知道哪天有同事把主鍵誤drop掉。
統計的BI報表數據多的離譜。經過的一番折騰,問題解決了。下面總結一下幾種方法:
1)閃回:oracle有閃回技術,可以利用recyclebin(回收站)查詢刪除的的主鍵,但是這之前要把重復的數據刪除。
2)利用rowid查詢重復數據並且幹掉相同數據除rowid最小,語句:
delete from 表 a where (a.Id,a.seq) in(select Id,seq from 表 group by Id,seq having count(*)> 1) and rowid not in (select min(rowid) from 表group by Id,seq having count(*)>1)
Oracle刪除重復數據