吃相難看的mysql重複資料刪除
阿新 • • 發佈:2019-02-01
需求描述
資料庫為mysql , 清理資料要對重複歷史資料做刪除操作, 環境有以下特點
- 目標表沒有主鍵
- 不允許做表結構變動
- 重複資料並非全部欄位重複, 只是部分標誌欄位重複
解決思路與困境解決
最初的思路是按照指定欄位找出重複記錄集, 每組僅保留一條記錄, 其餘資料做刪除處理
偽sql實現如下:
delete from r_data_1d a
where (a.c_res_id,a.c_task_time) in (select c_res_id,c_task_time from r_data_1d group by c_res_id,c_task_time having count(*) > 1)
and a.rowid not in (select min(rowid) from r_data_1d group by c_res_id,c_task_time having count(*)>1)
困境及解決方案:
1. mysql不支援rowid, 且表中沒有主鍵.
解決方案: 採取 收集-刪除-增補 三步走方式進行處理, 此處引進臨時表 r_data_1d_temp
2. 處理刪除步驟時, 發現mysql不支援對目標表進行刪除中查詢操作, 如該語句
delete from a where a.b in (select b from a group by b having count(*) > 1)
解決方案: 採取 收集-刪除 兩步完成該步驟, 此處引進臨時表 r_data_1d_del_temp
最終方案
最終對上述步驟進行了整合, 簡單描述為 收集補全-收集刪除資訊-執行刪除-執行補全
最終sql如下:
create r_data_1d_temp
select c_business_id, c_res_id , c_sub_res_id , max(c_in_avg), max(c_in_min), max(c_in_max), max(c_out_avg), max(c_out_min), max(c_out_max), c_task_time , c_tag1, c_tag2 from (
select * from r_data_1d a where (a.c_res_id,a.c_task_time) in (select c_res_id,c_task_time from r_data_1d group by c_res_id,c_task_time having count(*) > 1) )a
group by c_business_id, c_res_id, c_sub_res_id, c_task_time, c_tag1, c_tag2;
create table r_data_1d_del_temp select c_res_id,c_task_time from r_data_1d group by c_res_id,c_task_time having count(*) > 1;
delete from r_data_1d
where (c_res_id,c_task_time) in (select * from r_data_1d_del_temp);
drop table r_data_1d_del_temp;
insert into r_data_1d select * from r_data_1d_temp;
drop table r_data_1d_temp;
後記
目前該實現還是較為難看, 貼出來只是拋磚引玉, 希望能請到大神給出更優的解