我需要从一个相当大的SQL Server表(即300,000+行)中删除重复的行。
当然,由于RowID标识字段的存在,这些行不会完全重复。
MyTable
RowID int not null identity(1,1) primary key,
Col1 varchar(20) not null,
Col2 varchar(2048) not null,
Col3 tinyint not null
我该怎么做呢?
我需要从一个相当大的SQL Server表(即300,000+行)中删除重复的行。
当然,由于RowID标识字段的存在,这些行不会完全重复。
MyTable
RowID int not null identity(1,1) primary key,
Col1 varchar(20) not null,
Col2 varchar(2048) not null,
Col3 tinyint not null
我该怎么做呢?
当前回答
我想我应该分享我的解决方案,因为它在特殊情况下是有效的。 在我的例子中,有重复值的表没有外键(因为这些值是从另一个db复制的)。
begin transaction
-- create temp table with identical structure as source table
Select * Into #temp From tableName Where 1 = 2
-- insert distinct values into temp
insert into #temp
select distinct *
from tableName
-- delete from source
delete from tableName
-- insert into source from temp
insert into tableName
select *
from #temp
rollback transaction
-- if this works, change rollback to commit and execute again to keep you changes!!
PS:在处理这样的事情时,我总是使用事务,这不仅确保了所有事情都作为一个整体执行,而且还允许我在没有任何风险的情况下进行测试。但是当然你应该做个备份,以防万一……
其他回答
SELECT DISTINCT *
INTO tempdb.dbo.tmpTable
FROM myTable
TRUNCATE TABLE myTable
INSERT INTO myTable SELECT * FROM tempdb.dbo.tmpTable
DROP TABLE tempdb.dbo.tmpTable
我有一个表,需要保存不重复的行。 我不确定速度和效率。
DELETE FROM myTable WHERE RowID IN (
SELECT MIN(RowID) AS IDNo FROM myTable
GROUP BY Col1, Col2, Col3
HAVING COUNT(*) = 2 )
alter table MyTable add sno int identity(1,1)
delete from MyTable where sno in
(
select sno from (
select *,
RANK() OVER ( PARTITION BY RowID,Col3 ORDER BY sno DESC )rank
From MyTable
)T
where rank>1
)
alter table MyTable
drop column sno
我知道这个问题已经回答了,但我已经创建了非常有用的sp,它将为表副本创建一个动态删除语句:
CREATE PROCEDURE sp_DeleteDuplicate @tableName varchar(100), @DebugMode int =1
AS
BEGIN
SET NOCOUNT ON;
IF(OBJECT_ID('tempdb..#tableMatrix') is not null) DROP TABLE #tableMatrix;
SELECT ROW_NUMBER() OVER(ORDER BY name) as rn,name into #tableMatrix FROM sys.columns where [object_id] = object_id(@tableName) ORDER BY name
DECLARE @MaxRow int = (SELECT MAX(rn) from #tableMatrix)
IF(@MaxRow is null)
RAISERROR ('I wasn''t able to find any columns for this table!',16,1)
ELSE
BEGIN
DECLARE @i int =1
DECLARE @Columns Varchar(max) ='';
WHILE (@i <= @MaxRow)
BEGIN
SET @Columns=@Columns+(SELECT '['+name+'],' from #tableMatrix where rn = @i)
SET @i = @i+1;
END
---DELETE LAST comma
SET @Columns = LEFT(@Columns,LEN(@Columns)-1)
DECLARE @Sql nvarchar(max) = '
WITH cteRowsToDelte
AS (
SELECT ROW_NUMBER() OVER (PARTITION BY '+@Columns+' ORDER BY ( SELECT 0)) as rowNumber,* FROM '+@tableName
+')
DELETE FROM cteRowsToDelte
WHERE rowNumber > 1;
'
SET NOCOUNT OFF;
IF(@DebugMode = 1)
SELECT @Sql
ELSE
EXEC sp_executesql @Sql
END
END
如果你创建这样的表格
IF(OBJECT_ID('MyLitleTable') is not null)
DROP TABLE MyLitleTable
CREATE TABLE MyLitleTable
(
A Varchar(10),
B money,
C int
)
---------------------------------------------------------
INSERT INTO MyLitleTable VALUES
('ABC',100,1),
('ABC',100,1), -- only this row should be deleted
('ABC',101,1),
('ABC',100,2),
('ABCD',100,1)
-----------------------------------------------------------
exec sp_DeleteDuplicate 'MyLitleTable',0
它将从表中删除所有重复项。如果运行它时不带第二个参数,它将返回一条SQL语句来运行。
如果您需要排除任何列,只需在调试模式下运行它,获取代码并按照您的喜好修改它。
获取重复的行:
SELECT
name, email, COUNT(*)
FROM
users
GROUP BY
name, email
HAVING COUNT(*) > 1
删除重复的行。
DELETE users
WHERE rowid NOT IN
(SELECT MIN(rowid)
FROM users
GROUP BY name, email);