Methods and systems for matching records and normalizing names
First Claim
1. A computer-implemented method of normalizing strings, comprising:
- tokenizing a string into a sequence of components;
generating one or more sequences of tags by assigning tags to the components based on lookup tables;
determining, using a processor, a sequence of states of the components based on the one or more sequences of tags; and
generating a normalized string by normalizing the sequence of the states.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods and systems are provided for normalizing strings and for matching records. In one implementation, a string is tokenized into components. Sequences of tags are generated by assigning tags to the components. A sequence of states is determined based on the sequences of tags. A normalized string is generated by normalizing the sequence of the states. A key record including key fields is extracted from a first data source. A candidate record including candidate fields is extracted from a second data source. A numerical record including numerical fields is computed by comparing the key fields and the candidate fields using comparison functions. Matching functions determined by an additive logistic regression method are applied to the numerical fields. Whether the key record and the candidate record are a match is determined based on a sum of results of the matching functions.
-
Citations
42 Claims
-
1. A computer-implemented method of normalizing strings, comprising:
-
tokenizing a string into a sequence of components; generating one or more sequences of tags by assigning tags to the components based on lookup tables; determining, using a processor, a sequence of states of the components based on the one or more sequences of tags; and generating a normalized string by normalizing the sequence of the states. - View Dependent Claims (2, 3, 4, 5, 6, 40)
-
-
7. A system for normalizing strings, comprising:
-
a processor; means for tokenizing a string into a sequence of components; means for generating one or more sequences of tags by assigning tags to the components based on lookup tables; means for determining a sequence of states of the components based on the one or more sequences of tags; and means for generating a normalized string by normalizing the sequence of the states. - View Dependent Claims (8, 9, 10, 11, 12, 41)
-
-
13. A computer-readable storage medium including instructions which, when executed by a processor, perform a method of normalizing strings, the method comprising:
-
tokenizing a string into a sequence of components; generating one or more sequences of tags by assigning tags to the components based on lookup tables; determining a sequence of states of the components based on the one or more sequences of tags; and generating a normalized string by normalizing the sequence of the states. - View Dependent Claims (14, 15, 16, 17, 18, 42)
-
-
19. A computer-implemented method of matching records, comprising:
-
extracting a key record including key fields from a first data source; retrieving a candidate record including candidate fields from a second data source, the candidate fields corresponding to the key fields; computing, using a processor, a numerical record including numerical fields by comparing the key fields and the candidate fields using comparison functions, the numerical fields being result values of the comparison functions; applying matching functions to the numerical fields, the matching functions being determined by an additive logistic regression method; and determining whether the key record and the candidate record are a match based on a sum of results of the matching functions. - View Dependent Claims (20, 21, 22, 23, 24, 25)
-
-
26. A system for matching records, comprising:
-
a processor; means for extracting a key record including key fields from a first data source; means for retrieving a candidate record including candidate fields from a second data source, the candidate fields corresponding to the key fields; means for computing a numerical record including numerical fields by comparing the key fields and the candidate fields using comparison functions, the numerical fields being result values of the comparison functions; means for applying matching functions to the numerical fields, the matching functions being determined by an additive logistic regression method; and means for determining whether the key record and the candidate record are a match based on a sum of results of the matching functions. - View Dependent Claims (27, 28, 29, 30, 31, 32)
-
-
33. A computer-readable storage medium including instructions which, when executed by a processor, perform a method of matching records, the method comprising:
-
extracting a key record including key fields from a first data source; retrieving a candidate record including candidate fields from a second data source, the candidate fields corresponding to the key fields; computing a numerical record including numerical fields by comparing the key fields and the candidate fields using comparison functions, the numerical fields being result values of the comparison functions; applying matching functions to the numerical fields, the matching functions being determined by an additive logistic regression method; and determining whether the key record and the candidate record are a match based on a sum of results of the matching functions. - View Dependent Claims (34, 35, 36, 37, 38, 39)
-
Specification