METHOD AND SYSTEM FOR CREATING AND MAINTAINING UNIQUE DATA REPOSITORY
First Claim
Patent Images
1. A computer implemented method of managing and updating a data repository in real time, comprising:
- retrieving at least one record from a stream of incoming records maintained in a load repository for matching the retrieved record with a previous record history maintained in a detail repository based on a first uniquely identifying matching key;
performing upon the retrieved record a sequence of matching runs executable on a parallel processing engine, said matching runs further comprising;
a) matching the incoming record against a set of records maintained along with their second associated unique identifier in a master repository, said matching based upon a set of predefined matching conditions and performing at least one action type predefined for each possible matching result;
b) for records found non matching in step a, iteratively performing a matching process between the non matching records and image records, said matching based upon a set of predefined matching conditions and performing at least one action type predefined for each possible matching result;
c) for records found non matching in steps a and b, iteratively performing a self join matching process between the records of the load repository, said matching based upon a set of predefined matching conditions and performing at least one action type predefined for each possible matching result; and
d) for the records found non matching in steps a, b and c, identifying remaining similar matched records based on a criterion.
1 Assignment
0 Petitions
Accused Products
Abstract
In accordance with the disclosure, there is provided a system and method for creating and maintaining unique data repository comprising a matching process based on a set of predefined matching conditions and thereon performing an action type corresponding to the outcome of matching process. The present disclosure provides for real time data de-duplication and updation of unique data repository to obtain a unified view of unique and matching records.
-
Citations
11 Claims
-
1. A computer implemented method of managing and updating a data repository in real time, comprising:
-
retrieving at least one record from a stream of incoming records maintained in a load repository for matching the retrieved record with a previous record history maintained in a detail repository based on a first uniquely identifying matching key; performing upon the retrieved record a sequence of matching runs executable on a parallel processing engine, said matching runs further comprising; a) matching the incoming record against a set of records maintained along with their second associated unique identifier in a master repository, said matching based upon a set of predefined matching conditions and performing at least one action type predefined for each possible matching result; b) for records found non matching in step a, iteratively performing a matching process between the non matching records and image records, said matching based upon a set of predefined matching conditions and performing at least one action type predefined for each possible matching result; c) for records found non matching in steps a and b, iteratively performing a self join matching process between the records of the load repository, said matching based upon a set of predefined matching conditions and performing at least one action type predefined for each possible matching result; and d) for the records found non matching in steps a, b and c, identifying remaining similar matched records based on a criterion. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system for managing and updating a unique data repository in real time, the system comprising:
-
a load repository configured to store a stream of records, each of the incoming record being associated with a first identifying matching key; a parallel processing system for retrieving at least one record from the load repository and performing thereupon a sequence of matching runs and coordinating with a master repository that is configured to consolidate and store a set of matching records obtained from the matching run along with a corresponding second unique identifier, wherein said matching runs further include; in a load-master run, matching the incoming record against a set of records maintained in the master repository, said matching based upon a set of predefined matching conditions and performing at least one action type predefined for each possible matching result; in a load-image run, for records found non matching in the load master run, iteratively performing a matching process between the non matching records and image records, said matching based upon a set of predefined matching conditions and performing at least one action type predefined for each possible matching result; in a load-load run, for records found non matching in the load-master and load-image run, iteratively performing a self join matching process between the records of the load repository, said matching based upon a set of predefined matching conditions and performing at least one suited action type predefined for each possible matching result; and in an image-image run, for the records found non matching in the load-master run, load-image run, and load-load run, identifying for remaining similar matched records based on a criterion. - View Dependent Claims (10, 11)
-
Specification