Data Enrichment Using Business Compendium
First Claim
1. A computer-implemented method comprising:
- causing an enrichment engine to receive an input data set;
causing the enrichment engine to perform standardization and cleansing of the input data set;
causing the enrichment engine to identify duplicate entries in the input data set;
causing a matching component of the enrichment engine to perform matching of non-duplicate entries of the input data set to a business compendium having data compiled from a third party source;
causing the enrichment engine to create an enriched data set including additional information based upon the matching; and
providing the enriched data set to a user for manual review.
2 Assignments
0 Petitions
Accused Products
Abstract
Embodiments relate to enrichment of a data warehouse utilizing a business compendium. Embodiments may employ a process comprising data standardization and cleansing, de-duplication of entries, and matching and enrichment, followed by manual review of an enriched record by a user. During standardization, data may be transformed into consistent content, placing correct data elements into appropriate fields, removing invalid characters, and/or standardizing names and addresses. Duplicate records are then detected and marked. During matching and enrichment, the existence of an entity (such as a supplier), may be verified by progressive matching against the business compendium. Enrichment may provide additional information regarding the entity (e.g. related to risk, diversity, and bankruptcy). The enriched record is available for manual review, allowing the user to change duplicates, matches, and parent/child linkages. Feedback from the user review may enhance accuracy of subsequent enrichment by self-learning aspects, reducing over time a need for manual review.
137 Citations
20 Claims
-
1. A computer-implemented method comprising:
-
causing an enrichment engine to receive an input data set; causing the enrichment engine to perform standardization and cleansing of the input data set; causing the enrichment engine to identify duplicate entries in the input data set; causing a matching component of the enrichment engine to perform matching of non-duplicate entries of the input data set to a business compendium having data compiled from a third party source; causing the enrichment engine to create an enriched data set including additional information based upon the matching; and providing the enriched data set to a user for manual review. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A non-transitory computer readable storage medium embodying a computer program for performing a method, said method comprising:
-
causing an enrichment engine to receive an input data set; causing the enrichment engine to perform standardization and cleansing of the input data set; causing the enrichment engine to identify duplicate entries in the input data set; causing a matching component of the enrichment engine to perform matching of non-duplicate entries of the input data set to a business compendium having data compiled from a third party source; causing the enrichment engine to create an enriched data set including additional information based upon the matching; and providing the enriched data set to a user for manual review. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A computer system comprising:
-
one or more processors; a software program, executable on said computer system, the software program configured to; cause an enrichment engine to receive an input data set; cause the enrichment engine to perform standardization and cleansing of the input data set; cause the enrichment engine to identify duplicate entries in the input data set; cause a matching component of the enrichment engine to perform matching of non-duplicate entries of the input data set to a business compendium having data compiled from a third party source; cause the enrichment engine to create an enriched data set including additional information based upon the matching; and providing the enriched data set to a user for manual review. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification