If your interested to learn about the MySQL Find Duplicate Records
MySQL is a database application that stores data in tables in the form of rows and columns. This database application can store duplicate records in the table, which can impact the database’s performance in MySQL. However, data duplication occurs due to various reasons and removing the duplicate values in the table is an important task while working with a database in MySQL. Generally, it is good to always use unique constraints on a table to store data that prevent having duplicate rows. In this article, we are going to learn how we can remove duplicate records from the MySQL database.
Let us understand it with the help of an example. Suppose we have a table named “student_contacts” that contains many duplicate records:
Now, we will see how to delete duplicate records from the table. MySQL can remove duplicates record mainly in three ways.
Is it possible to delete duplicate records?
It can be done by many ways in sql server the most simplest way to do so is: Insert the distinct rows from the duplicate rows table to new temporary table.Then delete all the data from duplicate rows table then insert all data from temporary table which has no duplicates as shown below
How can I delete duplicate records in MySQL?
MySQL can remove duplicates record mainly in three ways.
- Delete Duplicate Record Using Delete Join. We can use the DELETE JOIN statement in MySQL that allows us to remove duplicate records quickly. …
- Delete Duplicate Record Using the ROW_NUMBER() Function. …
- DELETE Duplicate Rows Using Intermediate Table.
1. Delete Duplicate Record Using Delete Join
We can use the DELETE JOIN statement in MySQLthat allows us to remove duplicate records quickly. The following statement removes duplicate rows from the table and keeps the greatest id:
DELETE S1 FROM student_contacts AS S1 INNER JOIN student_contacts AS S2 WHERE S1.id < S2.id AND S1.email = S2.email;
This query references the student_contacts table twice. Therefore, we will use table alias S1 and S2. After executing the statement, we will get the following output:
The above output indicates that five records have been deleted from the table. We can verify this by executing the below query that returns the duplicates records of the table.
SELECT name, email, COUNT(name) FROM student_contacts GROUP BY name HAVING COUNT(name) > 1;
It will return the output as follows that shows an empty set. It means duplicate records have been successfully removed from the table.
We can also verify it by using the SELECT statement. In the below image, we can see that there are no duplicate records available in the table.
Suppose we want to remove duplicate records and keep the lowest id in the table. In that case, we will use the statement as follows:
DELETE S1 FROM student_contacts AS S1 INNER JOIN student_contacts AS S2 WHERE S1.id > S2.id AND S1.email = S2.email;
It is to note that we need to create a table again containing duplicate records before executing the query. After executing the statement, we will get the following output:
We can also verify it by using the SELECT statement. In the below image, we can see that the duplicate records with higher id have been removed.
2. Delete Duplicate Record Using the ROW_NUMBER() Function
T ROW_NUMBER() function returns the sequential number for each row within its partition, which starts from 1 to the number of rows present in the partition. We can use the below statement that assigns a sequential number to each row using the ROW_NUMBER() function. If this query finds the name column of the table duplicate, it will assign the row number greater than one.
SELECT id, name, ROW_NUMBER() OVER (PARTITION BY name ORDER BY name) AS row_num FROM student_contacts;
After execution, we will get the output as below:
If you want to get only duplicate id rows, use the below statement:
SELECT id FROM (SELECT id, ROW_NUMBER() OVER (PARTITION BY name ORDER BY name) AS row_num FROM student_contacts) AS temp_table WHERE row_num>1;
This statement reruns the following output:
We can now remove the duplicate records from the student_contacts table with the help of the DELETE statementand a subqueryin the WHERE clause. See the below statement:
DELETE FROM student_contacts WHERE id IN( SELECT id FROM (SELECT id, ROW_NUMBER() OVER (PARTITION BY name ORDER BY name) AS row_num FROM student_contacts) AS temp_table WHERE row_num>1 );
After execution, we will get the output as below image where we can see that this statement has been removed five records from the table. You can verify whether duplicate rows are removed or not using the SELECT statement.
3. DELETE Duplicate Rows Using Intermediate Table
We can also remove duplicate records from the table using an intermediate table. The following are the points to delete duplicate records with the help of an intermediate table:
1. Create a new table with the same structure as the original table that we will use to remove duplicate records.
mysql> CREATE TABLE new_table_name LIKE source_table_name;
2. Insert unique (distinct) rows of the original table into the newly created table.
mysql> INSERT INTO new_table_name SELECT * FROM source_table_name GROUP BY column; //It is the name of a column that contains duplicate values.
3. Delete the original table and rename the newly created table same as the original table.
mysql> DROP TABLE source_table_name; mysql> ALTER TABLE new_table_name RENAME TO source_table_name;
Let us understand the above steps with the help of the below queries that deletes the duplicates records using an intermediate table:
mysql> CREATE TABLE student_contacts_temp LIKE student_contacts;
mysql> INSERT INTO student_contacts_temp SELECT * FROM student_contacts GROUP BY email; //It is the name of column that contains duplicate values.
mysql> DROP TABLE student_contacts; mysql> ALTER TABLE student_contacts_temp RENAME TO student_contacts;
See the below image to understand the above steps.