Sometimes because you wrote really bad code cause you suck at your job(I’m not perfect), you get duplicates entries in the database. And Now you wanna know how to remove them. Well its quite easy really.
The theory: Take all the rows group them by the email and dump them in a new table. WT*?
Say you have a tables where the duplicate field is the “email”, i.e firstname.lastname@example.org appears too many times. The query below will group all the duplicates into one row.
SELECT * FROM table WHERE 1 GROUP BY [duplicate column];
SELECT * FROM users GROUP BY email;
If one there was a way to magically put all those rows in a new table…well there is…using the syntax;
CREATE new_table as SELECT * FROM old_table GROUP BY email;
- Create the new table first like this
CREATE new_table LIKE old_table
This will create a new table with the exact same structure, without the data. NOTE: It also doesn’t preserve the Indexes.
- And then copy all the data(non-duplicate across) using a INSERT…SELECT statement
INSERT INTO new_table SELECT * FROM old_table GROUP BY email
Remember “email” is the field you want to remove duplicates by or on.
These are not the only two methods for doing this, there are so many, for example(My personal favourite, one liner)
CREATE TABLE new_table SELECT * FROM old_table group by email
Does all the work, copies all the data, again indexes not preserved
And there you go, you have a new table with no duplicates. Disclaimer: Double check.