如何删除没有唯一行id存在的重复行?

我的座位是

col1  col2 col3 col4 col5 col6 col7
john  1    1    1    1    1    1 
john  1    1    1    1    1    1
sally 2    2    2    2    2    2
sally 2    2    2    2    2    2

我想留下以下重复删除后:

john  1    1    1    1    1    1
sally 2    2    2    2    2    2

我尝试了一些查询,但我认为他们取决于有一个行id,因为我没有得到想要的结果。例如:

DELETE
FROM table
WHERE col1 IN (
    SELECT id
    FROM table
    GROUP BY id
    HAVING (COUNT(col1) > 1)
)

当前回答

从一个巨大的(几百万条记录)表中删除重复项可能需要很长时间。我建议将所选行的批量插入到临时表中,而不是删除。

--REWRITING YOUR CODE(TAKE NOTE OF THE 3RD LINE) WITH CTE AS(SELECT NAME,ROW_NUMBER() 
OVER (PARTITION BY NAME ORDER BY NAME) ID FROM @TB) SELECT * INTO #unique_records FROM 
CTE WHERE ID =1;

其他回答

另一种在不丢失信息的情况下删除重复行的方法如下:

delete from dublicated_table t1 (nolock)
join (
    select t2.dublicated_field
    , min(len(t2.field_kept)) as min_field_kept
    from dublicated_table t2 (nolock)
    group by t2.dublicated_field having COUNT(*)>1
) t3 
on t1.dublicated_field=t3.dublicated_field 
    and len(t1.field_kept)=t3.min_field_kept

要在SQL Server中删除表中的重复行,请执行以下步骤:

使用GROUP BY子句或ROW_NUMBER()函数查找重复的行。 使用DELETE语句删除重复的行。

设置一个示例表

DROP TABLE IF EXISTS contacts;

CREATE TABLE contacts(
    contact_id INT IDENTITY(1,1) PRIMARY KEY,
    first_name NVARCHAR(100) NOT NULL,
    last_name NVARCHAR(100) NOT NULL,
    email NVARCHAR(255) NOT NULL,
);

插入的值

INSERT INTO contacts
    (first_name,last_name,email) 
VALUES
    ('Syed','Abbas','syed.abbas@example.com'),
    ('Catherine','Abel','catherine.abel@example.com'),
    ('Kim','Abercrombie','kim.abercrombie@example.com'),
    ('Kim','Abercrombie','kim.abercrombie@example.com'),
    ('Kim','Abercrombie','kim.abercrombie@example.com'),
    ('Hazem','Abolrous','hazem.abolrous@example.com'),
    ('Hazem','Abolrous','hazem.abolrous@example.com'),
    ('Humberto','Acevedo','humberto.acevedo@example.com'),
    ('Humberto','Acevedo','humberto.acevedo@example.com'),
    ('Pilar','Ackerman','pilar.ackerman@example.com');

查询

    SELECT 
   contact_id, 
   first_name, 
   last_name, 
   email
FROM 
   contacts;

从表中删除重复的行

   WITH cte AS (
    SELECT 
        contact_id, 
        first_name, 
        last_name, 
        email, 
        ROW_NUMBER() OVER (
            PARTITION BY 
                first_name, 
                last_name, 
                email
            ORDER BY 
                first_name, 
                last_name, 
                email
        ) row_num
     FROM 
        contacts
)
DELETE FROM cte
WHERE row_num > 1;

现在要删除记录吗

删除所有重复项,但删除第一个重复项(具有最小ID)

应该同样适用于其他SQL服务器,如Postgres:

DELETE FROM table
WHERE id NOT IN (
   select min(id) from table
   group by col1, col2, col3, col4, col5, col6, col7
)

从一个巨大的(几百万条记录)表中删除重复项可能需要很长时间。我建议将所选行的批量插入到临时表中,而不是删除。

--REWRITING YOUR CODE(TAKE NOTE OF THE 3RD LINE) WITH CTE AS(SELECT NAME,ROW_NUMBER() 
OVER (PARTITION BY NAME ORDER BY NAME) ID FROM @TB) SELECT * INTO #unique_records FROM 
CTE WHERE ID =1;

不使用CTE和ROW_NUMBER(),您可以通过使用group by和MAX函数删除记录,这里是一个例子

DELETE
FROM MyDuplicateTable
WHERE ID NOT IN
(
SELECT MAX(ID)
FROM MyDuplicateTable
GROUP BY DuplicateColumn1, DuplicateColumn2, DuplicateColumn3)