MySQL check duplicate with group by using wildcard?

    +----+--------------+-----+-----------+----------+     | ID | NAME         | AGE | ADDRESS   | SALARY   |     +----+--------------+-----+-----------+----------+     |  1 | Ramesh Olive |  32 | Ahmedabad |  2000.00 |     |  2 | Tan Kau      |  25 | Delhi     |  1500.00 |     |  3 | Jason Tan Kau|  25 | Delhi     |  2000.00 |     |  4 | Chaitali     |  25 | Mumbai    |  6500.00 |     |  5 | Hardik       |  27 | Bhopal    |  8500.00 |     |  6 | Hardik Jass  |  27 | Bhopal    |  4500.00 |     |  7 | Muffy John   |  24 | Indore    | 10000.00 |     |  8 | Muffy Lee    |  24 | Indore    | 10000.00 |     +----+--------------+-----+-----------+----------+ 

In example above, let said the table name is "table_a" and 1) "Tan Kau" is duplicate with "Jason Tan Kau" and 2) "Hardik" is duplicate with "Hardik Jass"

How to write SQL that will produce output like below?

I think this will work but it should be very slow. Any ideas to improve this?

Select A.*, IF(B.ID IS NULL, "", "DUP") as DUP FROM table_a A  LEFT JOIN table_a B  ON A.NAME LIKE CONCATE("%", B.NAME, "%") AND A.ID != B.ID       +----+--------------+-----+-----------+----------+-----+     | ID | NAME         | AGE | ADDRESS   | SALARY   | DUP |     +----+--------------+-----+-----------+----------+-----+     |  1 | Ramesh Olive |  32 | Ahmedabad |  2000.00 |     |     |  2 | Tan Kau      |  25 | Delhi     |  1500.00 | Dup |     |  3 | Jason Tan Kau|  25 | Delhi     |  2000.00 | Dup |     |  4 | Chaitali     |  25 | Mumbai    |  6500.00 |     |     |  5 | Hardik       |  27 | Bhopal    |  8500.00 | Dup |     |  6 | Hardik Jass  |  27 | Bhopal    |  4500.00 | Dup |      |  7 | Muffy John   |  24 | Indore    | 10000.00 |     |     |  8 | Muffy Lee    |  24 | Indore    | 10000.00 |     |     +----+--------------+-----+-----------+----------+-----+ 

Replay

Your query can return the expected results by adding the reverse condition:

SELECT A.*, IF(B.ID IS NULL, "", "DUP") as DUP
FROM persons A
LEFT JOIN persons B
ON a.ID <> b.ID
AND (a.Name LIKE CONCAT ("%", b.Name, "%") OR b.Name LIKE CONCAT ("%", a.Name, "%"))
ORDER BY ID;

I don't know if it will be faster, but another way to do it would be to use INSTR:

SELECT A.*, IF(B.ID IS NULL, "", "DUP") as DUP
FROM persons A
LEFT JOIN persons B
ON a.ID <> b.ID
AND (Instr(a.Name, b.Name) > 0 OR Instr(b.Name, a.Name) > 0)
ORDER BY ID;

SQL Fiddle

I did something a little different

SELECT DISTINCT AA.* FROM
(
    SELECT A.*,IF(IFNULL(B.ID,'')='','','Dup') DUP
    FROM table_a A LEFT JOIN table_a B ON a.ID <> b.ID
    AND IF(LENGTH(A.name)>LENGTH(B.name),
    INSTR(A.name,B.name)>0,
    INSTR(B.name,A.name)>0)
) AA;

NOTE : I basically plagiarized Leigh's answer and augmented it slightly, so please do not mark my answer as accepted !!!

The reason I gave this is in case there are more dups present

Here is your sample data plus two extra rows:

mysql> DROP DATABASE IF EXISTS cww;
Query OK, 1 row affected (0.03 sec)

mysql> CREATE DATABASE cww;
Query OK, 1 row affected (0.00 sec)

mysql> USE cww
Database changed
mysql> CREATE TABLE table_a
    -> (
    ->     ID INT NOT NULL AUTO_INCREMENT,
    ->     NAME VARCHAR(25) NOT NULL,
    ->     AGE INT NOT NULL,
    ->     ADDRESS VARCHAR(25) NOT NULL,
    ->     SALARY DECIMAL(10,2) NOT NULL,
    ->     PRIMARY KEY (ID)
    -> );
Query OK, 0 rows affected (0.10 sec)

mysql> INSERT INTO table_a (NAME,AGE,ADDRESS,SALARY) VALUES
    -> ('Ramesh Olive'   ,32,'Ahmedabad', 2000.00),
    -> ('Tan Kau'        ,25,'Delhi'    , 1500.00),
    -> ('Jason Tan Kau'  ,25,'Delhi'    , 2000.00),
    -> ('Jackson Tan Kau',25,'Delhi'    , 2000.00),
    -> ('Chaitali'       ,25,'Mumbai'   , 6500.00),
    -> ('Hardik'         ,27,'Bhopal'   , 8500.00),
    -> ('Hardik Jass'    ,27,'Bhopal'   , 4500.00),
    -> ('Hardik Jess'    ,27,'Bhopal'   , 4500.00),
    -> ('Muffy John'     ,24,'Indore'   , 10000.00),
    -> ('Muffy Lee'      ,24,'Indore'   , 10000.00);
Query OK, 10 rows affected (0.05 sec)
Records: 10  Duplicates: 0  Warnings: 0

mysql> SELECT * FROM table_a;
+----+-----------------+-----+-----------+----------+
| ID | NAME            | AGE | ADDRESS   | SALARY   |
+----+-----------------+-----+-----------+----------+
|  1 | Ramesh Olive    |  32 | Ahmedabad |  2000.00 |
|  2 | Tan Kau         |  25 | Delhi     |  1500.00 |
|  3 | Jason Tan Kau   |  25 | Delhi     |  2000.00 |
|  4 | Jackson Tan Kau |  25 | Delhi     |  2000.00 |
|  5 | Chaitali        |  25 | Mumbai    |  6500.00 |
|  6 | Hardik          |  27 | Bhopal    |  8500.00 |
|  7 | Hardik Jass     |  27 | Bhopal    |  4500.00 |
|  8 | Hardik Jess     |  27 | Bhopal    |  4500.00 |
|  9 | Muffy John      |  24 | Indore    | 10000.00 |
| 10 | Muffy Lee       |  24 | Indore    | 10000.00 |
+----+-----------------+-----+-----------+----------+
10 rows in set (0.00 sec)

mysql>

Notice how my augmented query handles the dups properly

mysql> SELECT DISTINCT AA.* FROM
    -> (
    ->     SELECT A.*,IF(IFNULL(B.ID,'')='','','Dup') DUP
    ->     FROM table_a A LEFT JOIN table_a B ON a.ID <> b.ID
    ->     AND IF(LENGTH(A.name)>LENGTH(B.name),
    ->     INSTR(A.name,B.name)>0,
    ->     INSTR(B.name,A.name)>0)
    -> ) AA;
+----+-----------------+-----+-----------+----------+-----+
| ID | NAME            | AGE | ADDRESS   | SALARY   | DUP |
+----+-----------------+-----+-----------+----------+-----+
|  1 | Ramesh Olive    |  32 | Ahmedabad |  2000.00 |     |
|  2 | Tan Kau         |  25 | Delhi     |  1500.00 | Dup |
|  3 | Jason Tan Kau   |  25 | Delhi     |  2000.00 | Dup |
|  4 | Jackson Tan Kau |  25 | Delhi     |  2000.00 | Dup |
|  5 | Chaitali        |  25 | Mumbai    |  6500.00 |     |
|  6 | Hardik          |  27 | Bhopal    |  8500.00 | Dup |
|  7 | Hardik Jass     |  27 | Bhopal    |  4500.00 | Dup |
|  8 | Hardik Jess     |  27 | Bhopal    |  4500.00 | Dup |
|  9 | Muffy John      |  24 | Indore    | 10000.00 |     |
| 10 | Muffy Lee       |  24 | Indore    | 10000.00 |     |
+----+-----------------+-----+-----------+----------+-----+
10 rows in set (0.00 sec)

mysql>

In the face of more dups, Leigh's query does this:

mysql> SELECT A.*, IF(B.ID IS NULL, "", "DUP") as DUP
    -> FROM table_a A
    -> LEFT JOIN table_a B
    -> ON a.ID <> b.ID
    -> AND (Instr(a.Name, b.Name) > 0 OR Instr(b.Name, a.Name) > 0)
    -> ORDER BY ID;
+----+-----------------+-----+-----------+----------+-----+
| ID | NAME            | AGE | ADDRESS   | SALARY   | DUP |
+----+-----------------+-----+-----------+----------+-----+
|  1 | Ramesh Olive    |  32 | Ahmedabad |  2000.00 |     |
|  2 | Tan Kau         |  25 | Delhi     |  1500.00 | DUP |
|  2 | Tan Kau         |  25 | Delhi     |  1500.00 | DUP |
|  3 | Jason Tan Kau   |  25 | Delhi     |  2000.00 | DUP |
|  4 | Jackson Tan Kau |  25 | Delhi     |  2000.00 | DUP |
|  5 | Chaitali        |  25 | Mumbai    |  6500.00 |     |
|  6 | Hardik          |  27 | Bhopal    |  8500.00 | DUP |
|  6 | Hardik          |  27 | Bhopal    |  8500.00 | DUP |
|  7 | Hardik Jass     |  27 | Bhopal    |  4500.00 | DUP |
|  8 | Hardik Jess     |  27 | Bhopal    |  4500.00 | DUP |
|  9 | Muffy John      |  24 | Indore    | 10000.00 |     |
| 10 | Muffy Lee       |  24 | Indore    | 10000.00 |     |
+----+-----------------+-----+-----------+----------+-----+
12 rows in set (0.00 sec)

mysql>

@LeighRiffel's answer just needs to be imbedded in a subquery and made DISTINCT:

mysql> SELECT DISTINCT * FROM (
    -> SELECT A.*, IF(B.ID IS NULL, "", "DUP") as DUP
    -> FROM table_a A
    -> LEFT JOIN table_a B
    -> ON a.ID <> b.ID
    -> AND (Instr(a.Name, b.Name) > 0 OR Instr(b.Name, a.Name) > 0)
    -> ORDER BY ID) AA;
+----+-----------------+-----+-----------+----------+-----+
| ID | NAME            | AGE | ADDRESS   | SALARY   | DUP |
+----+-----------------+-----+-----------+----------+-----+
|  1 | Ramesh Olive    |  32 | Ahmedabad |  2000.00 |     |
|  2 | Tan Kau         |  25 | Delhi     |  1500.00 | DUP |
|  3 | Jason Tan Kau   |  25 | Delhi     |  2000.00 | DUP |
|  4 | Jackson Tan Kau |  25 | Delhi     |  2000.00 | DUP |
|  5 | Chaitali        |  25 | Mumbai    |  6500.00 |     |
|  6 | Hardik          |  27 | Bhopal    |  8500.00 | DUP |
|  7 | Hardik Jass     |  27 | Bhopal    |  4500.00 | DUP |
|  8 | Hardik Jess     |  27 | Bhopal    |  4500.00 | DUP |
|  9 | Muffy John      |  24 | Indore    | 10000.00 |     |
| 10 | Muffy Lee       |  24 | Indore    | 10000.00 |     |
+----+-----------------+-----+-----------+----------+-----+
10 rows in set (0.00 sec)

mysql>

Nevertheless, Leigh's answer did provide the needed SQL principles upfront.

Therefore, I give him a +1 !!!

Category: mysql Time: 2012-11-14 Views: 2

Related post

iOS development

Android development

Python development

JAVA development

Development language

PHP development

Ruby development

search

Front-end development

Database

development tools

Open Platform

Javascript development

.NET development

cloud computing

server

Copyright (C) avrocks.com, All Rights Reserved.

processed in 0.204 (s). 12 q(s)