Friday, March 30, 2012

Resolving duplicates entries in table among 10 databases

Hi

Our product uses MS-SQL Server 2000. One of our customer has 10
installations with each installation stroring data in its own database.
Now the customer wants to consolidate these databases into one and we
already have plan for that by consolidating one DB at a time. But first
they want to find how many unique or duplicate entries they have across
all the 10 databases

Assumptions:
1. All the databases reside on the same server. (This is just an
assumption, not the real environment at customer site)
2. Databases can not be merged before it is found how many unique or
duplicate rows exist.

Table under consideration:
Message
(
HashID PK,
...
)

# of rows in Message table in each of databases: 1 Million

Here is my question: How can I find how many unique or duplicate
entries they have across all the 10 databases. I easily find unique
rows for two databases with a query like this:

SELECT COUNT(A.HasID) FROM db1.dbo.Message A LEFT OUTER JOIN ON
db2.dbo.Message B ON A.HashID = B.HashID WHERE B.HashID IS NULL

How can I do this for 10 databases. This will require factorial of 10
queries to solve this problem.

I will appreciate if someone can provide hint on this.

Regards
AK> Here is my question: How can I find how many unique or duplicate
> entries they have across all the 10 databases.

The following will list the count of unique values (Duplicates = 0) as well
has the non-unique values grouped by the number of duplicates (1-9).

SELECT
Duplicates,
(Duplicates + 1) * COUNT(*) AS TotalHashIDCount
FROM (
SELECT HashID, COUNT(*) - 1 AS Duplicates
FROM (
SELECT HashID FROM db1.dbo.Message
UNION ALL SELECT HashID FROM db2.dbo.Message
UNION ALL SELECT HashID FROM db3.dbo.Message
UNION ALL SELECT HashID FROM db4.dbo.Message
UNION ALL SELECT HashID FROM db5.dbo.Message
UNION ALL SELECT HashID FROM db6.dbo.Message
UNION ALL SELECT HashID FROM db7.dbo.Message
UNION ALL SELECT HashID FROM db8.dbo.Message
UNION ALL SELECT HashID FROM db9.dbo.Message
UNION ALL SELECT HashID FROM db10.dbo.Message
) AS Messages
GROUP BY HashID) AS HashIDCounts
GROUP BY Duplicates
ORDER BY Duplicates

--
Hope this helps.

Dan Guzman
SQL Server MVP

"AK" <ambkh@.yahoo.com> wrote in message
news:1139679057.443925.49210@.g44g2000cwa.googlegro ups.com...
> Hi
> Our product uses MS-SQL Server 2000. One of our customer has 10
> installations with each installation stroring data in its own database.
> Now the customer wants to consolidate these databases into one and we
> already have plan for that by consolidating one DB at a time. But first
> they want to find how many unique or duplicate entries they have across
> all the 10 databases
> Assumptions:
> 1. All the databases reside on the same server. (This is just an
> assumption, not the real environment at customer site)
> 2. Databases can not be merged before it is found how many unique or
> duplicate rows exist.
> Table under consideration:
> Message
> (
> HashID PK,
> ...
> )
> # of rows in Message table in each of databases: 1 Million
> Here is my question: How can I find how many unique or duplicate
> entries they have across all the 10 databases. I easily find unique
> rows for two databases with a query like this:
> SELECT COUNT(A.HasID) FROM db1.dbo.Message A LEFT OUTER JOIN ON
> db2.dbo.Message B ON A.HashID = B.HashID WHERE B.HashID IS NULL
> How can I do this for 10 databases. This will require factorial of 10
> queries to solve this problem.
> I will appreciate if someone can provide hint on this.
> Regards
> AK|||Thank you Dan. This is exactly what I needed (in fact more than what I
needed :)

Regards
AK

Dan Guzman wrote:
> > Here is my question: How can I find how many unique or duplicate
> > entries they have across all the 10 databases.
> The following will list the count of unique values (Duplicates = 0) as well
> has the non-unique values grouped by the number of duplicates (1-9).
> SELECT
> Duplicates,
> (Duplicates + 1) * COUNT(*) AS TotalHashIDCount
> FROM (
> SELECT HashID, COUNT(*) - 1 AS Duplicates
> FROM (
> SELECT HashID FROM db1.dbo.Message
> UNION ALL SELECT HashID FROM db2.dbo.Message
> UNION ALL SELECT HashID FROM db3.dbo.Message
> UNION ALL SELECT HashID FROM db4.dbo.Message
> UNION ALL SELECT HashID FROM db5.dbo.Message
> UNION ALL SELECT HashID FROM db6.dbo.Message
> UNION ALL SELECT HashID FROM db7.dbo.Message
> UNION ALL SELECT HashID FROM db8.dbo.Message
> UNION ALL SELECT HashID FROM db9.dbo.Message
> UNION ALL SELECT HashID FROM db10.dbo.Message
> ) AS Messages
> GROUP BY HashID) AS HashIDCounts
> GROUP BY Duplicates
> ORDER BY Duplicates
> --
> Hope this helps.
> Dan Guzman
> SQL Server MVP
> "AK" <ambkh@.yahoo.com> wrote in message
> news:1139679057.443925.49210@.g44g2000cwa.googlegro ups.com...
> > Hi
> > Our product uses MS-SQL Server 2000. One of our customer has 10
> > installations with each installation stroring data in its own database.
> > Now the customer wants to consolidate these databases into one and we
> > already have plan for that by consolidating one DB at a time. But first
> > they want to find how many unique or duplicate entries they have across
> > all the 10 databases
> > Assumptions:
> > 1. All the databases reside on the same server. (This is just an
> > assumption, not the real environment at customer site)
> > 2. Databases can not be merged before it is found how many unique or
> > duplicate rows exist.
> > Table under consideration:
> > Message
> > (
> > HashID PK,
> > ...
> > )
> > # of rows in Message table in each of databases: 1 Million
> > Here is my question: How can I find how many unique or duplicate
> > entries they have across all the 10 databases. I easily find unique
> > rows for two databases with a query like this:
> > SELECT COUNT(A.HasID) FROM db1.dbo.Message A LEFT OUTER JOIN ON
> > db2.dbo.Message B ON A.HashID = B.HashID WHERE B.HashID IS NULL
> > How can I do this for 10 databases. This will require factorial of 10
> > queries to solve this problem.
> > I will appreciate if someone can provide hint on this.
> > Regards
> > AK|||Better too much than too little :-)

--
Hope this helps.

Dan Guzman
SQL Server MVP

"AK" <ambkh@.yahoo.com> wrote in message
news:1139936166.210532.262820@.g43g2000cwa.googlegr oups.com...
> Thank you Dan. This is exactly what I needed (in fact more than what I
> needed :)
> Regards
> AK
> Dan Guzman wrote:
>> > Here is my question: How can I find how many unique or duplicate
>> > entries they have across all the 10 databases.
>>
>> The following will list the count of unique values (Duplicates = 0) as
>> well
>> has the non-unique values grouped by the number of duplicates (1-9).
>>
>> SELECT
>> Duplicates,
>> (Duplicates + 1) * COUNT(*) AS TotalHashIDCount
>> FROM (
>> SELECT HashID, COUNT(*) - 1 AS Duplicates
>> FROM (
>> SELECT HashID FROM db1.dbo.Message
>> UNION ALL SELECT HashID FROM db2.dbo.Message
>> UNION ALL SELECT HashID FROM db3.dbo.Message
>> UNION ALL SELECT HashID FROM db4.dbo.Message
>> UNION ALL SELECT HashID FROM db5.dbo.Message
>> UNION ALL SELECT HashID FROM db6.dbo.Message
>> UNION ALL SELECT HashID FROM db7.dbo.Message
>> UNION ALL SELECT HashID FROM db8.dbo.Message
>> UNION ALL SELECT HashID FROM db9.dbo.Message
>> UNION ALL SELECT HashID FROM db10.dbo.Message
>> ) AS Messages
>> GROUP BY HashID) AS HashIDCounts
>> GROUP BY Duplicates
>> ORDER BY Duplicates
>>
>> --
>> Hope this helps.
>>
>> Dan Guzman
>> SQL Server MVP
>>
>> "AK" <ambkh@.yahoo.com> wrote in message
>> news:1139679057.443925.49210@.g44g2000cwa.googlegro ups.com...
>> > Hi
>>> > Our product uses MS-SQL Server 2000. One of our customer has 10
>> > installations with each installation stroring data in its own database.
>> > Now the customer wants to consolidate these databases into one and we
>> > already have plan for that by consolidating one DB at a time. But first
>> > they want to find how many unique or duplicate entries they have across
>> > all the 10 databases
>>> > Assumptions:
>> > 1. All the databases reside on the same server. (This is just an
>> > assumption, not the real environment at customer site)
>> > 2. Databases can not be merged before it is found how many unique or
>> > duplicate rows exist.
>>> > Table under consideration:
>> > Message
>> > (
>> > HashID PK,
>> > ...
>> > )
>>> > # of rows in Message table in each of databases: 1 Million
>>> > Here is my question: How can I find how many unique or duplicate
>> > entries they have across all the 10 databases. I easily find unique
>> > rows for two databases with a query like this:
>>> > SELECT COUNT(A.HasID) FROM db1.dbo.Message A LEFT OUTER JOIN ON
>> > db2.dbo.Message B ON A.HashID = B.HashID WHERE B.HashID IS NULL
>>> > How can I do this for 10 databases. This will require factorial of 10
>> > queries to solve this problem.
>>> > I will appreciate if someone can provide hint on this.
>>> > Regards
>> > AK
>sql

No comments:

Post a Comment