LANGUAGE MODELING OF COMPLETE LANGUAGE SEQUENCES
First Claim
1. A method performed by data processing apparatus, the method comprising:
- accessing training data indicating queries submitted by one or more users;
determining, for each of the queries, a count of a number of times the training data indicates the query was submitted;
selecting a proper subset of the queries based on the counts;
training a first component of a language model based on the counts, the first component including first probability data indicating relative frequencies of the selected queries among the training data;
training a second component of the language model based on the training data, the second component including second probability data for assigning scores to queries that are not included in the selected queries;
determining adjustment data that normalizes the second probability data with respect to the first probability data; and
storing the first component, the second component, and the adjustment data.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for language modeling of complete language sequences. Training data indicating language sequences is accessed, and counts for a number of times each language sequence occurs in the training data are determined. A proper subset of the language sequences is selected, and a first component of a language model is trained. The first component includes first probability data for assigning scores to the selected language sequences. A second component of the language model is trained based on the training data, where the second component includes second probability data for assigning scores to language sequences that are not included in the selected language sequences. Adjustment data that normalizes the second probability data with respect to the first probability data is generated, and the first component, the second component, and the adjustment data are stored.
-
Citations
20 Claims
-
1. A method performed by data processing apparatus, the method comprising:
-
accessing training data indicating queries submitted by one or more users; determining, for each of the queries, a count of a number of times the training data indicates the query was submitted; selecting a proper subset of the queries based on the counts; training a first component of a language model based on the counts, the first component including first probability data indicating relative frequencies of the selected queries among the training data; training a second component of the language model based on the training data, the second component including second probability data for assigning scores to queries that are not included in the selected queries; determining adjustment data that normalizes the second probability data with respect to the first probability data; and storing the first component, the second component, and the adjustment data. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A system comprising:
one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising; accessing training data indicating queries submitted by one or more users; determining, for each of the queries, a count of a number of times the training data indicates the query was submitted; selecting a proper subset of the queries based on the counts; training a first component of a language model based on the counts, the first component including first probability data indicating relative frequencies of the selected queries among the training data; training a second component of the language model based on the training data, the second component including second probability data for assigning scores to queries that are not included in the selected queries; determining adjustment data that normalizes the second probability data with respect to the first probability data; and storing the first component, the second component, and the adjustment data.
-
20. A computer storage medium storing a computer program, the program comprising instructions that when executed by one or more computers cause the one or more computers to perform operations comprising:
-
accessing training data indicating queries submitted by one or more users; determining, for each of the queries, a count of a number of times the training data indicates the query was submitted; selecting a proper subset of the queries based on the counts; training a first component of a language model based on the counts, the first component including first probability data indicating relative frequencies of the selected queries among the training data; training a second component of the language model based on the training data, the second component including second probability data for assigning scores to queries that are not included in the selected queries; determining adjustment data that normalizes the second probability data with respect to the first probability data; and storing the first component, the second component, and the adjustment data.
-
Specification