LINGUISTIC KEY NORMALIZATION
First Claim
1. A method comprising:
- receiving a collection of phrases;
normalizing a plurality of phrases of the collection of phrases, the normalizing being based at least in part on lexicographic normalizing rules; and
generating a normalized phrase table including a plurality of key-value pairs, each key-value pair having a key that includes a normalized phrase and a value that includes one or more un-normalized phrases associated with the normalized phrase of the key and one or more parameters associated with each un-normalized phrase, the one or more parameters including a translation corresponding to the normalized phrase and a probability for the translation given the normalized phrase.
2 Assignments
0 Petitions
Accused Products
Abstract
Systems, methods, and apparatuses including computer program products are provided for training machine learning systems. In some implementations, a method is provided. The method includes receiving a collection of phrases, normalizing a plurality of phrases of the collection of phrases, the normalizing being based at least in part on lexicographic normalizing rules, and generating a normalized phrase table including a plurality of key-value pairs, each key value pair includes a key corresponding to a normalized phrase and a value corresponding to one or more un-normalized phrases associated with the normalized key, each un-normalized phrase having one or more parameters.
131 Citations
32 Claims
-
1. A method comprising:
-
receiving a collection of phrases; normalizing a plurality of phrases of the collection of phrases, the normalizing being based at least in part on lexicographic normalizing rules; and generating a normalized phrase table including a plurality of key-value pairs, each key-value pair having a key that includes a normalized phrase and a value that includes one or more un-normalized phrases associated with the normalized phrase of the key and one or more parameters associated with each un-normalized phrase, the one or more parameters including a translation corresponding to the normalized phrase and a probability for the translation given the normalized phrase. - View Dependent Claims (2, 5, 6)
-
-
3-4. -4. (canceled)
-
7. A method comprising:
-
receiving a training phrase; normalizing the training phrase according to one or more lexicographic normalization rules; identifying the normalized training phrase in a normalized phrase table, the normalized phrase table including a plurality of key-value pairs, each key-value pair having a key that includes a normalized phrase and a value that includes one or more un-normalized phrases associated with the normalized phrase of the key and one or more parameters associated with each un-normalized phrase; associating one or more weights to one or more un-normalized phrases associated with the key-value pair for the identified normalized training phrase in the normalized phrase table based on a relation of each associated un-normalized phrase to the received training phrase; and training a machine learning model using the one or more un-normalized phrases and the associated one or more weights. - View Dependent Claims (8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A computer program product, encoded on a non-transitory computer-readable medium, operable to cause a data processing apparatus to perform operations comprising:
-
receiving a collection of phrases; normalizing a plurality of phrases of the collection of phrases, the normalizing being based at least in part on lexicographic normalizing rules; and generating a normalized phrase table including a plurality of key-value pairs, each key-value pair having a key that includes a normalized phrase and a value that includes one or more un-normalized phrases associated with the normalized phrase of the key and one or more parameters associated with each un-normalized phrase, the one or more parameters including a translation corresponding to the normalized phrase and a probability for the translation given the normalized phrase. - View Dependent Claims (17, 20, 21)
-
-
18-19. -19. (canceled)
-
22. A computer program product, encoded on a non-transitory computer-readable medium, operable to cause a data processing apparatus to perform operations comprising:
-
receiving a training phrase; normalizing the training phrase according to one or more lexicographic normalization rules; identifying the normalized training phrase in a normalized phrase table, the normalized phrase table including a plurality of key-value pairs, each key-value pair having a key that includes a normalized phrase and a value that includes one or more un-normalized phrases associated with the normalized phrase of the key and one or more parameters associated with each un-normalized phrase; associating one or more weights to one or more un-normalized phrases associated with the key-value pair for the identified normalized training phrase in the normalized phrase table based on a relation of each associated un-normalized phrase to the received training phrase; and training a machine learning model using the one or more un-normalized phrases and the associated one or more weights. - View Dependent Claims (23, 24, 25, 26, 27, 28, 29, 30)
-
-
31. A system comprising:
one or more computers configured to perform operations including; receiving a collection of phrases; normalizing a plurality of phrases of the collection of phrases, the normalizing being based at least in part on lexicographic normalizing rules; and generating a normalized phrase table including a plurality of key-value pairs, each key-value pair having a key that includes a normalized phrase and a value that includes one or more un-normalized phrases associated with the normalized phrase of the key and one or more parameters associated with each un-normalized phrase, the one or more parameters including a translation corresponding to the normalized phrase and a probability for the translation given the normalized phrase.
-
32. A system comprising:
one or more computers configured to perform operations including; receiving a training phrase; normalizing the training phrase according to one or more lexicographic normalization rules; identifying the normalized training phrase in a normalized phrase table, the normalized phrase table including a plurality of key-value pairs, each key-value pair having a key that includes a normalized phrase and a value that includes one or more un-normalized phrases associated with the normalized phrase of the key and one or more parameters associated with each un-normalized phrase; associating one or more weights to one or more un-normalized phrases associated with the key-value pair for the identified normalized training phrase in the normalized phrase table based on a relation of each associated un-normalized phrase to the received training phrase; and training a machine learning model using the one or more un-normalized phrases and the associated one or more weights.
Specification