Thursday, June 10, 2010

Removing Duplicate Records from CRM

After a recent migration, we had a number of duplicate account records in CRM. We were loading hundreds of thousands of records for performance testing, and then we wanted to continue to use the same database for integration development, so we wanted to clean up the data and get rid of duplicates to minimize errors with the integration. You can use the built-in Duplicate Detection to identify duplicate records, but this doesn't help you get rid of them - you would still have to manually merge or delete the records that the Duplicate Detection identifies.

One of the challenges is that, while it's relatively easy to locate duplicates, you don't want to delete all of them - you want to keep one of the potential duplicates. After doing some searching and testing, here's what I came up with. It's a very simple (and entirely unsupported) SQL query to identify the duplicate accounts.


UPDATE Account
SET deletionstatecode = 2
WHERE accountid IN
( SELECT a.accountid
FROM Account a, Account b
WHERE a.accountid != b.accountid
AND a.accountnumber = b.accountnumber
AND a.statecode = b.statecode
AND a.accountid < b.accountid )

Here's how this works: The inner SELECT identifies accounts where the accountnumber and statecodes match, and, by using two aliases for the account table, it compares the ID of the accounts to each other. The UPDATE statement sets the deletionstatecode to a value of '2' which is a hard delete. The next time the deletion procedure runs on the database, it will clean out all the account records that have deletionstatecode set to 2.

No comments:

Post a Comment