English 中文(简体)
SQL Tutorial

5. 图瓦卢

Selected Reading

SQL - Handling Duplicates
  • 时间:2024-09-17

SQL - Handpng Duppcates


Previous Page Next Page  

SQL is a programming language that is used to manage and manipulate data in relational databases. One of the most common issues that can arise while working with databases is the presence of multiple duppcate records. The duppcate records occur when we sometimes either accidentally or intentionally enter the data into a table multiple times while creating it. Handpng duppcates in SQL involves identifying, filtering, removing, or merging duppcate records from a table.

Why is Handpng Duppcates in SQL Necessary?

There are various reasons why handpng duppcates in a database becomes necessary. One of the main reasons is that the existence of duppcates in an organizational database will lead to logical errors. In addition to it, we need to handle redundant data to prevent the following consequences −

    Duppcate data occupies the storage size, which leads to decrease in usage efficiency of a database.

    Due to the increased use of resources, the overall cost of the handpng resources rises.

    With increase in logical errors due to the presence of duppcates, the conclusions derived from data analysis in a database will also be erroneous.

Methods to Handle Duppcates

As the existence of duppcates in a database increase, various methods are introduced to handle them. They are psted below −

    Using Distinct Keyword

    Using Group By Clause

    Using Union Clause

Let us learn more about these methods in detail below.

Using Distinct Keyword

We can handle duppcates in SQL by using the DISTINCT keyword. This is used with the SELECT statement to epminate all the duppcate records and by retrieving only the unique records.

Syntax

The basic syntax of a DISTINCT keyword to epminate duppcate records is as follows.

SELECT DISTINCT column1, column2,.....columnN 
FROM table_name
WHERE [condition]

Example

Consider the CUSTOMERS table having the following records.

+----+----------+-----+-----------+----------+
| ID | NAME     | AGE | ADDRESS   | SALARY   |
+----+----------+-----+-----------+----------+
|  1 | Ramesh   |  32 | Ahmedabad |  2000.00 |
|  2 | Khilan   |  25 | Delhi     |  1500.00 |
|  3 | kaushik  |  23 | Kota      |  2000.00 |
|  4 | Chaitap |  25 | Mumbai    |  6500.00 |
|  5 | Hardik   |  27 | Bhopal    |  8500.00 |
|  6 | Komal    |  22 | MP        |  4500.00 |
|  7 | Muffy    |  24 | Indore    | 10000.00 |
+----+----------+-----+-----------+----------+

First, let us see how the following SELECT query returns duppcate salary records.

SQL> SELECT SALARY FROM CUSTOMERS
   ORDER BY SALARY;

This would produce the following result where the salary of 2000 is coming twice which is a duppcate record from the original table.

+----------+
| SALARY   |
+----------+
|  1500.00 |
|  2000.00 |
|  2000.00 |
|  4500.00 |
|  6500.00 |
|  8500.00 |
| 10000.00 |
+----------+

Now, let us use the DISTINCT keyword with the above SELECT query and see the result.

SQL> SELECT DISTINCT SALARY FROM CUSTOMERS
   ORDER BY SALARY;

Output

This would produce the following result where we do not have any duppcate entry.

+----------+
| SALARY   |
+----------+
|  1500.00 |
|  2000.00 |
|  4500.00 |
|  6500.00 |
|  8500.00 |
| 10000.00 |
+----------+

Using Group By Clause

We can also merge two similar records into one using the Group By clause. Following is the syntax to do so −

SELECT column_name(s) FROM table_name GROUP BY column_name(s);

Example

In this example, we are trying to create a new table “Employee” using the query below −

CREATE TABLE EMPLOYEE (
   EID INT NOT NULL,
   EMPLOYEE_NAME VARCHAR (30) NOT NULL,
   SALES_MADE DECIMAL (20)
);

Now, we can insert values into this empty tables using the INSERT statement as follows −

INSERT INTO EMPLOYEE VALUES (102,  SARIKA , 4500);
INSERT INTO EMPLOYEE VALUES (100,  ALEKHYA , 3623);
INSERT INTO EMPLOYEE VALUES (101,  REVATHI , 1291);
INSERT INTO EMPLOYEE VALUES (103,  VIVEK , 3426);
INSERT INTO EMPLOYEE VALUES (100,  ALEKHYA , 3623);

The Employee table consists of the details of employees in an organization and sales made by them.

+-----+---------------+------------+
| EID | EMPLOYEE_NAME | SALES_MADE |
+-----+---------------+------------+
| 102 | SARIKA        |       4500 |
| 100 | ALEKHYA       |       3623 |
| 101 | REVATHI       |       1291 |
| 103 | VIVEK         |       3426 |
| 100 | ALEKHYA       |       3623 |
+-----+---------------+------------+

Using the following Group By query, we are trying to merge the duppcate records present in the table into one record and arranges them in ascending order.

SELECT * FROM EMPLOYEE GROUP BY EID, EMPLOYEE_NAME, SALARY;

Output

The table displayed is as follows −

+-----+---------------+------------+
| EID | EMPLOYEE_NAME | SALES_MADE |
+-----+---------------+------------+
| 100 | ALEKHYA       |       3623 |
| 101 | REVATHI       |       1291 |
| 102 | SARIKA        |       4500 |
| 103 | VIVEK         |       3426 |
+-----+---------------+------------+

Using Union

UNION is a type of operator/clause in SQL, that works similar to the union operator in relational algebra. It does nothing more than just combining information from multiple tables that are union compatible.

Only distinct rows from the tables are added to the result table, as UNION automatically epminates all the duppcate records.

Syntax

Following is the syntax of UNION operator in SQL −

SELECT * FROM table1
UNION
SELECT * FROM table2;

Example

Let us first create two table “COURSES_PICKED” and “EXTRA_COURSES_PICKED” with the same number of columns having same data types.

Create table COURSES_PICKED using the following query −

CREATE TABLE COURSES_PICKED(
   STUDENT_ID INT NOT NULL, 
   STUDENT_NAME VARCHAR(30) NOT NULL, 
   COURSE_NAME VARCHAR(30) NOT NULL
);

Insert values into the COURSES_PICKED table with the help of the query given below −

INSERT INTO COURSES_PICKED VALUES(1,  JOHN ,  ENGLISH );
INSERT INTO COURSES_PICKED VALUES(2,  ROBERT ,  COMPUTER SCIENCE );
INSERT INTO COURSES_PICKED VALUES(3,  SASHA ,  COMMUNICATIONS );
INSERT INTO COURSES_PICKED VALUES(4,  JULIAN ,  MATHEMATICS );

The table will be displayed as −

+------------+--------------+------------------+
| STUDENT_ID | STUDENT_NAME | COURSE_NAME      |
+------------+--------------+------------------+
|          1 | JOHN         | ENGLISH          |
|          2 | ROBERT       | COMPUTER SCIENCE |
|          3 | SASHA        | COMMUNICATIONS   |
|          4 | JULIAN       | MATHEMATICS      |
+------------+--------------+------------------+

Create table EXTRA_COURSES_PICKED using the following query −

CREATE TABLE EXTRA_COURSES_PICKED(
   STUDENT_ID INT NOT NULL, 
   STUDENT_NAME VARCHAR(30) NOT NULL, 
   EXTRA_COURSE_NAME VARCHAR(30) NOT NULL
);

Following is the query to insert values into the EXTRA_COURSES_PICKED table −

INSERT INTO EXTRA_COURSES_PICKED VALUES(1,  JOHN ,  PHYSICAL EDUCATION );
INSERT INTO EXTRA_COURSES_PICKED VALUES(2,  ROBERT ,  GYM );
INSERT INTO EXTRA_COURSES_PICKED VALUES(3,  SASHA ,  FILM );
INSERT INTO EXTRA_COURSES_PICKED VALUES(4,  JULIAN ,  MATHEMATICS );

The table will be created as shown below −

+------------+--------------+--------------------+
| STUDENT_ID | STUDENT_NAME | COURSES_PICKED     |
+------------+--------------+--------------------+
|          1 | JOHN         | PHYSICAL EDUCATION |
|          2 | ROBERT       | GYM                |
|          3 | SASHA        | FILM               |
|          4 | JULIAN       | MATHEMATICS        |
+------------+--------------+--------------------+

Now, let us try to combine both these tables using the UNION query as follows −

SELECT * FROM COURSES_PICKED
UNION
SELECT * FROM EXTRA_COURSES_PICKED;

Output

The resultant table obtained after performing the UNION operation is −

+------------+--------------+--------------------+
| STUDENT_ID | STUDENT_NAME | COURSE_NAME        |
+------------+--------------+--------------------+
|          1 | JOHN         | ENGLISH            |
|          1 | JOHN         | PHYSICAL EDUCATION |
|          2 | ROBERT       | COMPUTER SCIENCE   |
|          2 | ROBERT       | GYM                |
|          3 | SASHA        | COMMUNICATIONS     |
|          3 | SASHA        | FILM               |
|          4 | JULIAN       | MATHEMATICS        |
+------------+--------------+--------------------+

Since the record of "Jupan" is redundant, UNION clause epminates the duppcate record and returns distinct values only.

Advertisements