1. 程式人生 > >吃相難看的mysql重複資料刪除

吃相難看的mysql重複資料刪除

需求描述

資料庫為mysql , 清理資料要對重複歷史資料做刪除操作, 環境有以下特點
- 目標表沒有主鍵
- 不允許做表結構變動
- 重複資料並非全部欄位重複, 只是部分標誌欄位重複

解決思路與困境解決

最初的思路是按照指定欄位找出重複記錄集, 每組僅保留一條記錄, 其餘資料做刪除處理
偽sql實現如下:

delete from r_data_1d a 
where (a.c_res_id,a.c_task_time) in (select c_res_id,c_task_time from r_data_1d group by c_res_id,c_task_time having
count(*) > 1) and a.rowid not in (select min(rowid) from r_data_1d group by c_res_id,c_task_time having count(*)>1)

困境及解決方案:
1. mysql不支援rowid, 且表中沒有主鍵.
解決方案: 採取 收集-刪除-增補 三步走方式進行處理, 此處引進臨時表 r_data_1d_temp
2. 處理刪除步驟時, 發現mysql不支援對目標表進行刪除中查詢操作, 如該語句
delete from a where a.b in (select b from a group by b having count(*) > 1)


解決方案: 採取 收集-刪除 兩步完成該步驟, 此處引進臨時表 r_data_1d_del_temp

最終方案

最終對上述步驟進行了整合, 簡單描述為 收集補全-收集刪除資訊-執行刪除-執行補全
最終sql如下:

create r_data_1d_temp
select c_business_id, c_res_id , c_sub_res_id , max(c_in_avg), max(c_in_min), max(c_in_max), max(c_out_avg), max(c_out_min), max(c_out_max), c_task_time , c_tag1, c_tag2 from
( select * from r_data_1d a where (a.c_res_id,a.c_task_time) in (select c_res_id,c_task_time from r_data_1d group by c_res_id,c_task_time having count(*) > 1) )a group by c_business_id, c_res_id, c_sub_res_id, c_task_time, c_tag1, c_tag2;
create table r_data_1d_del_temp select c_res_id,c_task_time from r_data_1d group by c_res_id,c_task_time having count(*) > 1; delete from r_data_1d where (c_res_id,c_task_time) in (select * from r_data_1d_del_temp); drop table r_data_1d_del_temp; insert into r_data_1d select * from r_data_1d_temp; drop table r_data_1d_temp;

後記

目前該實現還是較為難看, 貼出來只是拋磚引玉, 希望能請到大神給出更優的解