Determining word boundary likelihoods in potentially incomplete text
First Claim
1. A computer implemented method, comprising:
- accessing stored queries, each query being one or more characters in a first sequence constituting one or more words in a second sequence;
for each query;
selecting query sequences from the query, each query sequence being at least a portion of a word n-gram, the word n-gram being a subsequence of up to n words selected from the second sequence of words of the query, and for each selected query sequence;
determining a query sequence key for the selected query sequence;
determining a word boundary likelihood that represents a likelihood that the selected query sequence terminates at a word boundary, the word boundary likelihood being based on a second likelihood that query sequences that are the same as the selected query sequence are one of an end portion of a completed query or a portion of a query sequence that includes a space character as a next character, wherein the second likelihood is based on a word boundary count for the query sequence, the word boundary count being based on a number of the queries for which the query sequence includes a space character as a next character; and
associating, in a data storage device, the word boundary likelihood with the query sequence key.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for determining word boundary likelihoods in potentially incomplete text. In one aspect, a method includes selecting query sequences from the query, each query sequence being at least a portion of a word n-gram, the word n-gram being a subsequence of up to n words selected from the second sequence of words of the query, and for each query sequence: determining one or more query sequence keys for the query sequence; determining at least one of a word boundary count and a non-word boundary count for each query sequence key, each word-boundary count and non-word boundary count being dependent on the context of the query sequence; and associating, in a data storage device, the at least one word boundary count and non-word boundary counts with each query sequence key.
15 Citations
20 Claims
-
1. A computer implemented method, comprising:
-
accessing stored queries, each query being one or more characters in a first sequence constituting one or more words in a second sequence; for each query; selecting query sequences from the query, each query sequence being at least a portion of a word n-gram, the word n-gram being a subsequence of up to n words selected from the second sequence of words of the query, and for each selected query sequence; determining a query sequence key for the selected query sequence; determining a word boundary likelihood that represents a likelihood that the selected query sequence terminates at a word boundary, the word boundary likelihood being based on a second likelihood that query sequences that are the same as the selected query sequence are one of an end portion of a completed query or a portion of a query sequence that includes a space character as a next character, wherein the second likelihood is based on a word boundary count for the query sequence, the word boundary count being based on a number of the queries for which the query sequence includes a space character as a next character; and associating, in a data storage device, the word boundary likelihood with the query sequence key. - View Dependent Claims (2, 3, 5, 6)
-
-
4. A computer implemented method, comprising:
-
accessing stored queries, each query being one or more characters in a first sequence constituting one or more words in a second sequence; for each query; selecting query sequences from the query, each query sequence being at least a portion of a word n-gram, the word n-gram being a subsequence of up to n words selected from the second sequence of words of the query, and for each selected query sequence; determining a query sequence key for the selected query sequence; determining a word boundary likelihood that represents a likelihood that the selected query sequence terminates at a word boundary, the word boundary likelihood being based on a second likelihood that query sequences that are the same as the selected query sequence are one of an end portion of a completed query or a portion of a query sequence that includes a space character as a next character, wherein the second likelihood is based on a word boundary count for the query sequence, the word boundary count being based on a number of the queries for which the query sequence is an end portion of the query; and associating, in a data storage device, the likelihood with the query sequence key.
-
-
7. A system, comprising:
-
a data processing apparatus; and a computer storage medium encoded with a computer program, the program comprising data processing apparatus instructions that when executed by the data processing apparatus cause the data processing apparatus to perform operations comprising; accessing queries stored queries, each query being one or more characters in a first sequence constituting one or more words in a second sequence; for each query; selecting query sequences from the query, each query sequence being at least a portion of a word n-gram, the word n-gram being a subsequence of up to n words selected from the second sequence of words of the query, and for each selected query sequence; determining a query sequence key for the selected query sequence; determining a word boundary likelihood that represents a likelihood that the selected query sequence terminates at a word boundary, the word boundary likelihood being based on a second likelihood that query sequences that are the same as the selected query sequence are one of an end portion of a completed query or a portion of a query sequence that includes a space character as a next character, wherein the second likelihood is based on a word boundary count for the query sequence, the word boundary count being based on a number of the queries for which the query sequence includes a space character as a next character; and associating, in a data storage device, the likelihood with the query sequence key. - View Dependent Claims (8, 9)
-
-
10. A system comprising:
-
a data processing apparatus; and a computer storage medium encoded with a computer program, the program comprising data processing apparatus instructions that when executed by the data processing apparatus cause the data processing apparatus to perform operations comprising; providing to a client device a search resource including interface instructions that cause the client device to generate a search interface that includes a query input field; receiving a query input from a client device, the query input having been input into the query input field and being one or more characters in a first input sequence constituting one or more words in a second input sequence; selecting a query input sequence from the query input, the query input sequence being up to a word n-gram, the word n-gram being a subsequence of up to n words selected from the most subsequent words of the one or more words in the second input sequence; determining a likelihood that the query input sequence terminates at a word boundary from a word boundary count and a non-word boundary count associated with a query sequence key matching the query input sequence; and providing, to the client device, search results responsive to the query input based on the determined likelihood. - View Dependent Claims (11, 12, 13, 14, 15)
-
-
16. A method comprising:
-
providing to a client device a search resource including interface instructions that cause the client device to generate a search interface that includes a query input field; receiving a query input from a client device, the query input having been input into the query input field and being one or more characters in a first input sequence constituting one or more words in a second input sequence; selecting a query input sequence from the query input, the query input sequence being up to a word n-gram, the word n-gram being a subsequence of up to n words selected from the most subsequent words of the one or more words in the second input sequence; determining a likelihood that the query input sequence terminates at a word boundary from a word boundary count and a non-word boundary count associated with a query sequence key matching the query input sequence; and providing, to the client device, search results responsive to the query input based on the determined likelihood. - View Dependent Claims (17, 18, 19)
-
-
20. A computer storage medium encoded with a computer program, the program comprising instructions that when executed by one or more data processing apparatus cause the data processing apparatus to perform operations comprising:
-
providing to a client device a search resource including interface instructions that cause the client device to generate a search interface that includes a query input field; receiving a query input from a client device, the query input having been input into the query input field and being one or more characters in a first input sequence constituting one or more words in a second input sequence; selecting a query input sequence from the query input, the query input sequence being up to a word n-gram, the word n-gram being a subsequence of up to n words selected from the most subsequent words of the one or more words in the second input sequence; determining a likelihood that the query input sequence terminates at a word boundary from a word boundary count and a non-word boundary count associated with a query sequence key matching the query input sequence; and providing, to the client device, search results responsive to the query input based on the determined likelihood.
-
Specification