System and Method for Matching Data Using Probabilistic Modeling Techniques
First Claim
1. A system for matching data comprising:
- a computer system for electronically receiving a dataset;
a near-exact matching model, executed by the computer system, which pre-processes the dataset to generate a plurality of text strings and compares the text strings to identify matching data in the dataset;
a fingerprint matching model, executed by the computer system, which converts each entry of the dataset into a corresponding text fingerprint and compares resultant text fingerprints to identify matching data in the dataset; and
a fuzzy text matching model, executed by the computer system, which applies probabilistic modeling techniques to the dataset to identify matching data in the dataset,wherein the system transmits the matching data to a user.
7 Assignments
0 Petitions
Accused Products
Abstract
A system and method for matching data using probabilistic modeling techniques is provided. The system includes a computer system and a data matching model/engine. The present invention precisely and automatically matches and identifies entities from approximately matching short string text (e.g., company names, product names, addresses, etc.) by pre-processing datasets using a near-exact matching model and a fingerprint matching model, and then applying a fuzzy text matching model. More specifically, the fuzzy text matching model applies an Inverse Document Frequency function to a simple data entry model and combines this with one or more unintentional error metrics/measures and/or intentional spelling variation metrics/measures through a probabilistic model. The system can be autonomous and robust, and allow for variations and errors in text, while appropriately penalizing the similarity score, thus allowing dataset linking through text columns.
35 Citations
39 Claims
-
1. A system for matching data comprising:
-
a computer system for electronically receiving a dataset; a near-exact matching model, executed by the computer system, which pre-processes the dataset to generate a plurality of text strings and compares the text strings to identify matching data in the dataset; a fingerprint matching model, executed by the computer system, which converts each entry of the dataset into a corresponding text fingerprint and compares resultant text fingerprints to identify matching data in the dataset; and a fuzzy text matching model, executed by the computer system, which applies probabilistic modeling techniques to the dataset to identify matching data in the dataset, wherein the system transmits the matching data to a user. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method for matching data comprising the steps of:
-
electronically receiving a dataset at a computer system; executing on the computer system a near-exact matching model which pre-processes the dataset to generate a plurality of text strings and compares the text strings to identify matching data in the dataset; executing on the computer system a fingerprint matching model, executed by the computer system, which converts each entry of the dataset into a corresponding text fingerprint and compares resultant text fingerprints to identify matching data in the dataset; executing on the computer system a fuzzy text matching model which applies probabilistic modeling techniques to the dataset to identify matching data in the dataset; and transmitting any matching data identified by the system to a user. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A computer-readable medium having computer-readable instructions stored thereon which, when executed by a computer system, cause the computer system to perform the steps of:
-
electronically receiving a dataset at the computer system; executing on the computer system a near-exact matching model which pre-processes the dataset to generate a plurality of text strings and compares the text strings to identify matching data in the dataset; executing on the computer system a fingerprint matching model which converts each entry of the dataset into a corresponding text fingerprint and compares resultant text fingerprints to identify matching data in the dataset; executing on the computer system a fuzzy text matching model which applies probabilistic modeling techniques to the dataset to identify matching data in the dataset; and transmitting any matching data identified by the system to a user. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30)
-
-
31. A method for matching data comprising the steps of:
-
electronically receiving a dataset at a computer system; executing on the computer system a fuzzy text matching model which applies probabilistic modeling techniques to the dataset to identify matching data in the dataset; and transmitting any matching data identified by the system to a user. - View Dependent Claims (32, 33, 34, 35, 36, 37, 38, 39)
-
Specification