我需要从一个相当大的SQL Server表(即300,000+行)中删除重复的行。

当然,由于RowID标识字段的存在,这些行不会完全重复。

MyTable

RowID int not null identity(1,1) primary key,
Col1 varchar(20) not null,
Col2 varchar(2048) not null,
Col3 tinyint not null

我该怎么做呢?


当前回答

这是删除重复记录最简单的方法

 DELETE FROM tblemp WHERE id IN 
 (
  SELECT MIN(id) FROM tblemp
   GROUP BY  title HAVING COUNT(id)>1
 )

其他回答

这将删除重复的行,除了第一行

DELETE
FROM
    Mytable
WHERE
    RowID NOT IN (
        SELECT
            MIN(RowID)
        FROM
            Mytable
        GROUP BY
            Col1,
            Col2,
            Col3
    )

引用(http://www.codeproject.com/Articles/157977/Remove-Duplicate-Rows-from-a-Table-in-SQL-Server)

alter table MyTable add sno int identity(1,1)
    delete from MyTable where sno in
    (
    select sno from (
    select *,
    RANK() OVER ( PARTITION BY RowID,Col3 ORDER BY sno DESC )rank
    From MyTable
    )T
    where rank>1
    )

    alter table MyTable 
    drop  column sno

创建另一个由原始值组成的表:

CREATE TABLE table2 AS SELECT *, COUNT(*) FROM table1 GROUP BY name HAVING COUNT (*) > 0

我想这会很有帮助。这里,ROW_NUMBER() OVER(分区由res1。Title ORDER BY res1.Id)作为num来区分重复的行。

delete FROM
(SELECT res1.*,ROW_NUMBER() OVER(PARTITION BY res1.Title ORDER BY res1.Id)as num
 FROM 
(select * from [dbo].[tbl_countries])as res1
)as res2
WHERE res2.num > 1

在微软支持网站上有一篇关于删除重复文件的好文章。这是相当保守的——他们让你在不同的步骤中做所有的事情——但它应该适用于大的表格。

在过去,我使用了自连接来实现这一点,尽管它可能会用一个HAVING子句来美化:

DELETE dupes
FROM MyTable dupes, MyTable fullTable
WHERE dupes.dupField = fullTable.dupField 
AND dupes.secondDupField = fullTable.secondDupField 
AND dupes.uniqueField > fullTable.uniqueField