1. 程式人生 > >再戰mysql 數據去重

再戰mysql 數據去重

去重 dna 數據去重 value rop 現在 order by drop rep

年初時,寫過一篇去重的,在小表中還能用用,在大表中真的是效率低下,現在給了一次優化
https://www.cnblogs.com/jarjune/p/8328013.html

繼上一篇文章

方法三:

DELIMITER //

DROP PROCEDURE IF EXISTS delete_rows_2;

CREATE PROCEDURE delete_rows_2(IN TABLENAME VARCHAR(50), IN FIELDNAMES VARCHAR(100), IN AUTOFIELD VARCHAR(50))
BEGIN

DECLARE DELETE_TABLE_ROWS_SQL VARCHAR(1000);

SET DELETE_TABLE_ROWS_SQL = CONCAT('
        DELETE 
        FROM 
            ', TABLENAME ,' 
        WHERE 
            (', FIELDNAMES ,') IN ( 
                SELECT ', FIELDNAMES ,' 
                FROM (
                    SELECT 
                        ', FIELDNAMES ,' 
                    FROM 
                        ', TABLENAME ,' 
                    GROUP BY 
                        ', FIELDNAMES ,' 
                    HAVING 
                        COUNT(1) > 1 
                ) t1
            ) 
        AND ', AUTOFIELD ,' NOT IN ( 
            SELECT ', AUTOFIELD ,' 
            FROM (
                SELECT 
                    MAX(', AUTOFIELD ,') ', AUTOFIELD ,' 
                FROM 
                    ', TABLENAME ,'
                GROUP BY 
                    ', FIELDNAMES ,' 
                HAVING 
                    COUNT(1) > 1 
                ) t2
            )
');

SET @DELETE_TABLE_ROWS_SQL = DELETE_TABLE_ROWS_SQL;

PREPARE DELETE_TABLE_ROWS_SQL_PRE FROM @DELETE_TABLE_ROWS_SQL;
EXECUTE DELETE_TABLE_ROWS_SQL_PRE;

END//

DELIMITER ;

CALL delete_rows_1('表名', '字段1,字段2,字段3...', '主鍵(唯一)字段');

之後發現刪除的效率還是挺低,又優化成

方法三(優化):

DELIMITER //

DROP PROCEDURE IF EXISTS delete_rows_2;

CREATE PROCEDURE delete_rows_2(IN TABLENAME VARCHAR(50), IN FIELDNAMES VARCHAR(100), IN AUTOFIELD VARCHAR(50))
BEGIN

DECLARE DELETE_TABLE_ROWS_SQL VARCHAR(1000);

SET DELETE_TABLE_ROWS_SQL = CONCAT('
        DELETE 
        FROM 
            ', TABLENAME ,' 
        WHERE 
            ', AUTOFIELD ,' IN ( 
                SELECT 
                    ', AUTOFIELD ,' 
                FROM
                    (
                    SELECT 
                        ', AUTOFIELD ,' 
                    FROM 
                        ', TABLENAME ,' 
                    WHERE 
                        (', FIELDNAMES ,') IN ( 
                            SELECT 
                                ', FIELDNAMES ,' 
                            FROM 
                                ', TABLENAME ,' 
                            GROUP BY 
                                ', FIELDNAMES ,' 
                            HAVING 
                                COUNT(1) > 1 
                        ) 
                    AND ', AUTOFIELD ,' NOT IN ( 
                        SELECT 
                            MAX(', AUTOFIELD ,') 
                        FROM 
                            ', TABLENAME ,'
                        GROUP BY 
                            ', FIELDNAMES ,' 
                        HAVING 
                            COUNT(1) > 1 
                    ) 
                ) t2 
            ) 
    ');

SET @DELETE_TABLE_ROWS_SQL = DELETE_TABLE_ROWS_SQL;

PREPARE DELETE_TABLE_ROWS_SQL_PRE FROM @DELETE_TABLE_ROWS_SQL;
EXECUTE DELETE_TABLE_ROWS_SQL_PRE;

END//

DELIMITER ;

CALL delete_rows_2('表名', '字段1,字段2,字段3...', '主鍵字段');

由於上述都要group by 兩次,又換了一種思路

方法四

delete t1
FROM
    l_weijij_47 t1,
    (
        SELECT
            f01,
            f02,
            f03,
            MAX(seq_value) seq_value
        FROM
            l_weijij_47
        GROUP BY
            f01,
            f02,
            f03
        HAVING
            COUNT(1) > 1
        ORDER BY NULL
    ) t2
where
    t1.f01 = t2.f01
AND t1.f02 = t2.f02
AND t1.f03 = t2.f03
and t1.seq_value < t2.seq_value

註:group by默認會進行排序,所以要加上order by NULL就避免了排序

綜上,方法四是目前在用的去重。

再戰mysql 數據去重