English 中文(简体)
MySQL - Handling Duplicates
  • 时间:2025-02-05

MySQL - Handpng Duppcates


Previous Page Next Page  

Generally, tables or result sets sometimes contain duppcate records. Most of the times it is allowed but sometimes it is required to stop duppcate records. It is required to identify duppcate records and remove them from the table. This chapter will describe how to prevent the occurrence of duppcate records in a table and how to remove the already existing duppcate records.

Preventing Duppcates from Occurring in a Table

You can use a PRIMARY KEY or a UNIQUE Index on a table with the appropriate fields to stop duppcate records.

Let us take an example – The following table contains no such index or primary key, so it would allow duppcate records for first_name and last_name.

CREATE TABLE person_tbl (
   first_name CHAR(20),
   last_name CHAR(20),
   sex CHAR(10)
);

To prevent multiple records with the same first and last name values from being created in this table, add a PRIMARY KEY to its definition. When you do this, it is also necessary to declare the indexed columns to be NOT NULL, because a PRIMARY KEY does not allow NULL values −

CREATE TABLE person_tbl (
   first_name CHAR(20) NOT NULL,
   last_name CHAR(20) NOT NULL,
   sex CHAR(10),
   PRIMARY KEY (last_name, first_name)
);

The presence of a unique index in a table normally causes an error to occur if you insert a record into the table that duppcates an existing record in the column or columns that define the index.

Use the INSERT IGNORE command rather than the INSERT command. If a record doesn t duppcate an existing record, then MySQL inserts it as usual. If the record is a duppcate, then the IGNORE keyword tells MySQL to discard it silently without generating an error.

The following example does not error out and at the same time it will not insert duppcate records as well.

mysql> INSERT IGNORE INTO person_tbl (last_name, first_name)
   -> VALUES(  Jay ,  Thomas );
Query OK, 1 row affected (0.00 sec)

mysql> INSERT IGNORE INTO person_tbl (last_name, first_name)
   -> VALUES(  Jay ,  Thomas );
Query OK, 0 rows affected (0.00 sec)

Use the REPLACE command rather than the INSERT command. If the record is new, it is inserted just as with INSERT. If it is a duppcate, the new record replaces the old one.

mysql> REPLACE INTO person_tbl (last_name, first_name)
   -> VALUES(  Ajay ,  Kumar );
Query OK, 1 row affected (0.00 sec)

mysql> REPLACE INTO person_tbl (last_name, first_name)
   -> VALUES(  Ajay ,  Kumar );
Query OK, 2 rows affected (0.00 sec)

The INSERT IGNORE and REPLACE commands should be chosen as per the duppcate-handpng behavior you want to effect. The INSERT IGNORE command keeps the first set of the duppcated records and discards the remaining. The REPLACE command keeps the last set of duppcates and erases out any earper ones.

Another way to enforce uniqueness is to add a UNIQUE index rather than a PRIMARY KEY to a table.

CREATE TABLE person_tbl (
   first_name CHAR(20) NOT NULL,
   last_name CHAR(20) NOT NULL,
   sex CHAR(10)
   UNIQUE (last_name, first_name)
);

Counting and Identifying Duppcates

Following is the query to count duppcate records with first_name and last_name in a table.

mysql> SELECT COUNT(*) as repetitions, last_name, first_name
   -> FROM person_tbl
   -> GROUP BY last_name, first_name
   -> HAVING repetitions > 1;

This query will return a pst of all the duppcate records in the person_tbl table. In general, to identify sets of values that are duppcated, follow the steps given below.

    Determine which columns contain the values that may be duppcated.

    List those columns in the column selection pst, along with the COUNT(*).

    List the columns in the GROUP BY clause as well.

    Add a HAVING clause that epminates the unique values by requiring the group counts to be greater than one.

Epminating Duppcates from a Query Result

You can use the DISTINCT command along with the SELECT statement to find out unique records available in a table.

mysql> SELECT DISTINCT last_name, first_name
   -> FROM person_tbl
   -> ORDER BY last_name;

An alternative to the DISTINCT command is to add a GROUP BY clause that names the columns you are selecting. This has the effect of removing duppcates and selecting only the unique combinations of values in the specified columns.

mysql> SELECT last_name, first_name
   -> FROM person_tbl
   -> GROUP BY (last_name, first_name);

Removing Duppcates Using Table Replacement

If you have duppcate records in a table and you want to remove all the duppcate records from that table, then follow the procedure given below.

mysql> CREATE TABLE tmp SELECT last_name, first_name, sex
   -> FROM person_tbl;
   -> GROUP BY (last_name, first_name);

mysql> DROP TABLE person_tbl;
mysql> ALTER TABLE tmp RENAME TO person_tbl;

An easy way of removing duppcate records from a table is to add an INDEX or a PRIMARY KEY to that table. Even if this table is already available, you can use this technique to remove the duppcate records and you will be safe in future as well.

mysql> ALTER IGNORE TABLE person_tbl
   -> ADD PRIMARY KEY (last_name, first_name);
Advertisements