Wednesday, March 21, 2012

Need assistance with Duplicate Select Statment

Any help would be appreciated here. My question is two part:

1.) Select duplicates that match any of a number of columns for example...

The email is the same OR

The homephone is the same OR

The mobilephone is the same OR

The address1 is the same

The uniqueID is ConsIntID

My Select Statement: (Which does not work)

SELECT CONSINTID, FIRSTNAME, LASTNAME, EMAIL1, ADDRESS1, HOMEPHONE, MOBILEPHONE, CREATEDATE, USERIDS

FROM CONSULTANTS

WHERE CONSINTID IN (SELECT CONSINTID

FROM CONSULTANTS

HAVING COUNT(HOMEPHONE) > 1 OR COUNT(MOBILEPHONE) >1 OR etc.... )

-

Once the Select Duplicates Statement is corrected I will insert those into a ConsultantsDupe Table

The Second Question is: If I have more then one duplicate my stored_procedure fails

Here is my cursor and logic. The problem is getting multiple results in the subquery and I don't know the best way to correct this.

DECLARE DUPES CURSOR FOR

SELECT CONSINTID, FIRSTNAME, LASTNAME, EMAIL1, ADDRESS1, HOMEPHONE, MOBILEPHONE, CREATEDATE, USERIDS

FROM CONSULTANTS_DUPS

ORDER BY CREATEDATE DESC

OPEN DUPES

FETCH NEXT FROM DUPES

INTO @.CONSINTID, @.FIRSTNAME, @.LASTNAME, @.EMAIL1, @.ADDRESS1, @.HOMEPHONE, @.MOBILEPHONE, @.CREATEdATE, @.USERIDS

WHILE @.@.FETCH_STATUS = 0

BEGIN

SET @.FOUNDCONSINTID = (SELECT DISTINCT CONSINTID FROM CONSULTANTS_DUPS

WHERE CONSINTID <> @.CONSINTID AND HOMEPHONE = @.HOMEPHONE "NEED TO ADD COMPARISON FOR MOBILEPHONE, EMAIL ADDRESS HERE") <== This can provide more then one result which messes me up.

I'll take a couple shots at this. Hopefully somebody can post something even better. You'll have to forgive me. Without any sample tables I had to "shoot from the hip" and write it without any syntax checking.

SELECT * FROM Consultants C

WHERE EXISTS (SELECT * FROM Consultants CPhone

WHERE (C.HomePhone = CPhone.HomePhone

OR C.HomePhone = CPhone.CellPhone

OR C.CellPhone = CPhone.HomePhone

OR C.CellPhone = CPhone.CellPhone)

AND C.CONSINTID <> CPhone.CONSINTID

)

OR EXISTS (SELECT * FROM Consultants CEmail

WHERE C.EMAIL1 = CEmail.EMAIL1

AND C.CONSINTID <> EMAIL1.CONSINTID)

OR EXISTS ...

Here's another shot:

SELECT * FROM Consultants

WHERE HomePhone IN (SELECT HomePhone

FROM Consultants

WHERE HomePhone IS NOT NULL

GROUP BY HomePhone

HAVING COUNT(*) > 1)

OR CellPhone IN (SELECT CellPhone

FROM Consultants

WHERE CellPhone IS NOT NULL

GROUP BY CellPhone

HAVING COUNT(*) > 1)

OR Email1 IN (SELECT Email1

FROM Consultants

WHERE Email1 IS NOT NULL

GROUP BY Email1

HAVING COUNT(*) > 1)

This method suffers from the fact that you can't check HomePhone and CellPhone at the same time (with UNION ALL) because of a contact has the same phone number listed as home and cell then it could show up as a duplicate.

Here's where you might take the previous query as a natural transition to cross-check home phone against cell phone:

SELECT * FROM Consultants C

WHERE HomePhone IN (SELECT HomePhone

FROM (SELECT CONSINTID, HomePhone

FROM Consultants UNION

SELECT CONSINTID, CellPhone

FROM Consultants) SubQ

WHERE HomePhone IS NOT NULL

GROUP BY HomePhone

HAVING COUNT(*) > 1)

OR CellPhone IN (SELECT HomePhone

FROM (SELECT CONSINTID, HomePhone

FROM Consultants UNION

SELECT CONSINTID, CellPhone

FROM Consultants) SubQ

WHERE HomePhone IS NOT NULL

GROUP BY HomePhone

HAVING COUNT(*) > 1)

OR Email1 IN (SELECT Email1

FROM Consultants

WHERE Email1 IS NOT NULL

GROUP BY Email1

HAVING COUNT(*) > 1)

You may also be better off with multiple SELECT statements as it's hard to tell the cause of your duplicate in a query like this. Keep in mind that all of these queries are very read-intensive as you're dealing with multiple scans across your table. You may benefit, instead, by placing your duplicates in a temporary table (or table variable) and then querying against the temporary table.

|||something like this:

select *
from consultants
where consintid in
(select consintid
from consultants
group by email
having count(*)>1
union all
select consintid

from consultants

group by homephone

having count(*)>1

union all
select consintid

from consultants

group by cellphone

having count(*)>1

union all
select consintid

from consultants

group by address1

having count(*)>1
)|||

Jared,

Yours worked! This gives me what I need for the duplicate temp table and the assistance is greatly appreciated.

Any feedback on the second portion?

No comments:

Post a Comment