我有一个表与以下字段:

id (Unique)
url (Unique)
title
company
site_id

现在,我需要删除具有相同标题、company和site_id的行。一种方法是使用下面的SQL和脚本(PHP):

SELECT title, site_id, location, id, count( * ) 
FROM jobs
GROUP BY site_id, company, title, location
HAVING count( * ) >1

运行此查询后,可以使用服务器端脚本删除重复项。

但是,我想知道这是否只能使用SQL查询。


当前回答

一种简单易懂且不需要主键的解决方案:

add a new boolean column alter table mytable add tokeep boolean; add a constraint on the duplicated columns AND the new column alter table mytable add constraint preventdupe unique (mycol1, mycol2, tokeep); set the boolean column to true. This will succeed only on one of the duplicated rows because of the new constraint update ignore mytable set tokeep = true; delete rows that have not been marked as tokeep delete from mytable where tokeep is null; drop the added column alter table mytable drop tokeep;

我建议您保留您添加的约束,以便将来防止出现新的重复。

其他回答

如果IGNORE语句在我的例子中不起作用,你可以使用下面的语句:

CREATE TABLE your_table_deduped LIKE your_table;


INSERT your_table_deduped
SELECT *
FROM your_table
GROUP BY index1_id,
         index2_id;

RENAME TABLE your_table TO your_table_with_dupes;

RENAME TABLE your_table_deduped TO your_table;

#OPTIONAL
ALTER TABLE `your_table` ADD UNIQUE `unique_index` (`index1_id`, `index2_id`);

#OPTIONAL
DROP TABLE your_table_with_dupes;

MySQL对引用要删除的表有限制。你可以用一个临时表来解决这个问题,比如:

create temporary table tmpTable (id int);

insert  into tmpTable
        (id)
select  id
from    YourTable yt
where   exists
        (
        select  *
        from    YourTabe yt2
        where   yt2.title = yt.title
                and yt2.company = yt.company
                and yt2.site_id = yt.site_id
                and yt2.id > yt.id
        );

delete  
from    YourTable
where   ID in (select id from tmpTable);

以下是Kostanos在评论中的建议: 上面唯一缓慢的查询是DELETE,适用于数据库非常大的情况。这个查询可以更快:

DELETE FROM YourTable USING YourTable, tmpTable WHERE YourTable.id=tmpTable.id

为了复制具有唯一列的记录,例如COL1,COL2, COL3不应该被复制(假设我们在表结构中遗漏了3个唯一列,并且在表中添加了多个重复项)

DROP TABLE TABLE_NAME_copy;
CREATE TABLE TABLE_NAME_copy LIKE TABLE_NAME;
INSERT INTO TABLE_NAME_copy
SELECT * FROM TABLE_NAME
GROUP BY COLUMN1, COLUMN2, COLUMN3; 
DROP TABLE TABLE_NAME;
ALTER TABLE TABLE_NAME_copy RENAME TO TABLE_NAME;

希望将有助于开发。

还有另一种解决方案:

DELETE t1 FROM my_table t1, my_table t2 WHERE t1.id < t2.id AND t1.my_field = t2.my_field AND t1.my_field_2 = t2.my_field_2 AND ...

你可以很容易地从这个代码中删除重复的记录。

$qry = mysql_query("SELECT * from cities");
while($qry_row = mysql_fetch_array($qry))
{
$qry2 = mysql_query("SELECT * from cities2 where city = '".$qry_row['city']."'");

if(mysql_num_rows($qry2) > 1){
    while($row = mysql_fetch_array($qry2)){
        $city_arry[] = $row;

        }

    $total = sizeof($city_arry) - 1;
        for($i=1; $i<=$total; $i++){


            mysql_query( "delete from cities2 where town_id = '".$city_arry[$i][0]."'");

            }
    }
    //exit;
}