In the data world, duplicated data is a very common issue. It may duplicated for few fields or even worse you have more then two identical rows in the table and there is no way to distinguish those rows. So how to clean those duplicated data?
DELETE Duplicate Records Using TOP
Since SQL Server 2005 we can use the TOP command when we issue the delete, such as the following.
CREATE TABLE dbo.duplicateTest ( [id] [INT] NULL, [name] [VARCHAR](100) ) INSERT INTO dbo.duplicateTest VALUES (1, 'a') INSERT INTO dbo.duplicateTest VALUES (1, 'a') INSERT INTO dbo.duplicateTest VALUES (1, 'a') INSERT INTO dbo.duplicateTest VALUES (2, 'b') INSERT INTO dbo.duplicateTest VALUES (3, 'c') DECLARE @id int WHILE EXISTS (SELECT COUNT(id) FROM dbo.duplicateTest GROUP BY id HAVING COUNT(id) > 1 ) BEGIN SELECT TOP 1 @id = id FROM dbo.duplicateTest GROUP BY id HAVING COUNT(id) > 1 DELETE TOP(1) FROM dbo.duplicateTest WHERE id =@id END
So as you can see using the top in while loop we can remove multiple duplicated records even you are not sure which id is duplicated.
DELETE Duplicate Records Using CTE
another way we can use to remove duplicated records is by using CTE.
TRUNCATE TABLE dbo.duplicateTest INSERT INTO dbo.duplicateTest VALUES(1, 'Bob') INSERT INTO dbo.duplicateTest VALUES(2, 'Dave') INSERT INTO dbo.duplicateTest VALUES(3, 'Karen') INSERT INTO dbo.duplicateTest VALUES(1, 'Bob') INSERT INTO dbo.duplicateTest VALUES(1, 'Bob') WITH cte1 AS ( SELECT ROW_NUMBER() OVER ( PARTITION BY ID ORDER BY ID) rank1,* FROM dbo.duplicateTest ) DELETE FROM cte1 WHERE rank1 > 1