Linguistic key normalization
First Claim
1. A computer-implemented method executed by one or more processors, the method comprising:
- receiving a collection of phrases;
normalizing a plurality of phrases of the collection of phrases, the normalizing being based at least in part on lexicographic normalizing rules;
generating a normalized phrase table including a plurality of key-value pairs, each key-value pair having a key that includes a normalized phrase and a value that includes one or more un-normalized phrases associated with the normalized phrase of the key and one or more parameters associated with each un-normalized phrase, the one or more parameters including a translation corresponding to the normalized phrase and a probability for the translation given the normalized phrase;
receiving a training phrase;
normalizing the training phrase according to one or more lexicographic normalization rules;
locating the normalized training phrase in a normalized phrase table, the normalized phrase table including a plurality of key-value pairs, each key-value pair having a key that includes a normalized phrase and a value that includes one or more un-normalized phrases associated with the normalized phrase of the key and one or more parameters associated with each un-normalized phrase;
associating one or more weights to one or more un-normalized phrases associated with the key-value pair for the identified normalized training phrase in the normalized phrase table based on a relation of each associated un-normalized phrase to the received training phrase; and
determining a degree of match between the received training phrase and a specific un-normalized phrase associated with the located normalized training phrase, the degree of match being determined according to a similarity measure, wherein associating one or more weights comprises;
associating a first weight to the specific un-normalized phrase when the training phrase has a high degree of match with the specific un-normalized phrase, andassociating a second weight to the specific un-normalized phrase when the training phrase has a low degree of match with the specific un-normalized phrase.
2 Assignments
0 Petitions
Accused Products
Abstract
Systems, methods, and apparatuses including computer program products are provided for training machine learning systems. In some implementations, a method is provided. The method includes receiving a collection of phrases, normalizing a plurality of phrases of the collection of phrases, the normalizing being based at least in part on lexicographic normalizing rules, and generating a normalized phrase table including a plurality of key-value pairs, each key value pair includes a key corresponding to a normalized phrase and a value corresponding to one or more un-normalized phrases associated with the normalized key, each un-normalized phrase having one or more parameters.
122 Citations
26 Claims
-
1. A computer-implemented method executed by one or more processors, the method comprising:
-
receiving a collection of phrases; normalizing a plurality of phrases of the collection of phrases, the normalizing being based at least in part on lexicographic normalizing rules; generating a normalized phrase table including a plurality of key-value pairs, each key-value pair having a key that includes a normalized phrase and a value that includes one or more un-normalized phrases associated with the normalized phrase of the key and one or more parameters associated with each un-normalized phrase, the one or more parameters including a translation corresponding to the normalized phrase and a probability for the translation given the normalized phrase; receiving a training phrase; normalizing the training phrase according to one or more lexicographic normalization rules; locating the normalized training phrase in a normalized phrase table, the normalized phrase table including a plurality of key-value pairs, each key-value pair having a key that includes a normalized phrase and a value that includes one or more un-normalized phrases associated with the normalized phrase of the key and one or more parameters associated with each un-normalized phrase; associating one or more weights to one or more un-normalized phrases associated with the key-value pair for the identified normalized training phrase in the normalized phrase table based on a relation of each associated un-normalized phrase to the received training phrase; and determining a degree of match between the received training phrase and a specific un-normalized phrase associated with the located normalized training phrase, the degree of match being determined according to a similarity measure, wherein associating one or more weights comprises; associating a first weight to the specific un-normalized phrase when the training phrase has a high degree of match with the specific un-normalized phrase, and associating a second weight to the specific un-normalized phrase when the training phrase has a low degree of match with the specific un-normalized phrase. - View Dependent Claims (2, 3, 4)
-
-
5. A computer-implemented method executed by one or more processors, the method comprising:
-
receiving a training phrase; normalizing the training phrase according to one or more lexicographic normalization rules; locating the normalized training phrase in a normalized phrase table, the normalized phrase table including a plurality of key-value pairs, each key-value pair having a key that includes a normalized phrase and a value that includes one or more un-normalized phrases associated with the normalized phrase of the key and one or more parameters associated with each un-normalized phrase; associating one or more weights to one or more un-normalized phrases associated with the key-value pair for the identified normalized training phrase in the normalized phrase table based on a relation of each associated un-normalized phrase to the received training phrase; determining a degree of match between the received training phrase and a specific un-normalized phrase associated with the located normalized training phrase, the degree of match being determined according to a similarity measure, wherein associating one or more weights comprises; associating a first weight to the specific un-normalized phrase when the training phrase has a high degree of match with the specific un-normalized phrase, and associating a second weight to the specific un-normalized phrase when the training phrase has a low degree of match with the specific un-normalized phrase; and training a machine learning model using the one or more un-normalized phrases and the associated one or more weights. - View Dependent Claims (6, 7, 8, 9, 10, 11, 12)
-
-
13. A computer program product, encoded on a non-transitory computer-readable medium, operable to cause a data processing apparatus to perform operations comprising:
-
receiving a collection of phrases; normalizing a plurality of phrases of the collection of phrases, the normalizing being based at least in part on lexicographic normalizing rules; generating a normalized phrase table including a plurality of key-value pairs, each key-value pair having a key that includes a normalized phrase and a value that includes one or more un-normalized phrases associated with the normalized phrase of the key and one or more parameters associated with each un-normalized phrase, the one or more parameters including a translation corresponding to the normalized phrase and a probability for the translation given the normalized phrase; receiving a training phrase; normalizing the training phrase according to one or more lexicographic normalization rules; locating the normalized training phrase in a normalized phrase table, the normalized phrase table including a plurality of key-value pairs, each key-value pair having a key that includes a normalized phrase and a value that includes one or more un-normalized phrases associated with the normalized phrase of the key and one or more parameters associated with each un-normalized phrase; associating one or more weights to one or more un-normalized phrases associated with the key-value pair for the identified normalized training phrase in the normalized phrase table based on a relation of each associated un-normalized phrase to the received training phrase; and determining a degree of match between the received training phrase and a specific un-normalized phrase associated with the located normalized training phrase, the degree of match being determined according to a similarity measure, wherein associating one or more weights comprises; associating a first weight to the specific un-normalized phrase when the training phrase has a high degree of match with the specific un-normalized phrase, and associating a second weight to the specific un-normalized phrase when the training phrase has a low degree of match with the specific un-normalized phrase. - View Dependent Claims (14, 15, 16)
-
-
17. A computer program product, encoded on a non-transitory computer-readable medium, operable to cause a data processing apparatus to perform operations comprising:
-
receiving a training phrase; normalizing the training phrase according to one or more lexicographic normalization rules; locating the normalized training phrase in a normalized phrase table, the normalized phrase table including a plurality of key-value pairs, each key-value pair having a key that includes a normalized phrase and a value that includes one or more un-normalized phrases associated with the normalized phrase of the key and one or more parameters associated with each un-normalized phrase; associating one or more weights to one or more un-normalized phrases associated with the key-value pair for the identified normalized training phrase in the normalized phrase table based on a relation of each associated un-normalized phrase to the received training phrase; determining a degree of match between the received training phrase and a specific un-normalized phrase associated with the located normalized training phrase, the degree of match being determined according to a similarity measure, wherein associating one or more weights comprises; associating a first weight to the specific un-normalized phrase when the training phrase has a high degree of match with the specific un-normalized phrase, associating a second weight to the specific un-normalized phrase when the training phrase has a low degree of match with the specific un-normalized phrase; and training a machine learning model using the one or more un-normalized phrases and the associated one or more weights. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24)
-
-
25. A system comprising:
one or more computers configured to perform operations including; receiving a collection of phrases; normalizing a plurality of phrases of the collection of phrases, the normalizing being based at least in part on lexicographic normalizing rules; generating a normalized phrase table including a plurality of key-value pairs, each key-value pair having a key that includes a normalized phrase and a value that includes one or more un-normalized phrases associated with the normalized phrase of the key and one or more parameters associated with each un-normalized phrase, the one or more parameters including a translation corresponding to the normalized phrase and a probability for the translation given the normalized phrase; receiving a training phrase; normalizing the training phrase according to one or more lexicographic normalization rules; locating the normalized training phrase in a normalized phrase table, the normalized phrase table including a plurality of key-value pairs, each key-value pair having a key that includes a normalized phrase and a value that includes one or more un-normalized phrases associated with the normalized phrase of the key and one or more parameters associated with each un-normalized phrase; associating one or more weights to one or more un-normalized phrases associated with the key-value pair for the identified normalized training phrase in the normalized phrase table based on a relation of each associated un-normalized phrase to the received training phrase; and determining a degree of match between the received training phrase and a specific un-normalized phrase associated with the located normalized training phrase, the degree of match being determined according to a similarity measure, wherein associating one or more weights comprises; associating a first weight to the specific un-normalized phrase when the training phrase has a high degree of match with the specific un-normalized phrase, and associating a second weight to the specific un-normalized phrase when the training phrase has a low degree of match with the specific un-normalized phrase.
-
26. A system comprising:
one or more computers configured to perform operations including; receiving a training phrase; normalizing the training phrase according to one or more lexicographic normalization rules; locating the normalized training phrase in a normalized phrase table, the normalized phrase table including a plurality of key-value pairs, each key-value pair having a key that includes a normalized phrase and a value that includes one or more un-normalized phrases associated with the normalized phrase of the key and one or more parameters associated with each un-normalized phrase; associating one or more weights to one or more un-normalized phrases associated with the key-value pair for the identified normalized training phrase in the normalized phrase table based on a relation of each associated un-normalized phrase to the received training phrase; determining a degree of match between the received training phrase and a specific un-normalized phrase associated with the located normalized training phrase, the degree of match being determined according to a similarity measure, wherein associating one or more weights comprises; associating a first weight to the specific un-normalized phrase when the training phrase has a high degree of match with the specific un-normalized phrase, associating a second weight to the specific un-normalized phrase when the training phrase has a low degree of match with the specific un-normalized phrase; and training a machine learning model using the one or more un-normalized phrases and the associated one or more weights.
Specification