Determining word boundary likelihoods in potentially incomplete text
First Claim
1. A system comprising:
- a data processing apparatus; and
a computer storage medium encoded with a computer program, the program comprising data processing apparatus instructions that when executed by the data processing apparatus cause the data processing apparatus to perform operations comprising;
accessing queries stored in query logs, each query being one or more characters in a first sequence constituting one or more words in a second sequence;
for each query;
selecting query sequences from the query, each query sequence being at least a portion of a word n-gram, the word n-gram being a subsequence of up to n words selected from the second sequence of words of the query, and for each selected query sequence;
determining one or more query sequence keys for the query sequence;
determining at least one of a word boundary count and a non-word boundary count for each query sequence key, each word-boundary count and non-word boundary count being dependent on the context of the query sequence; and
associating, in a data storage device, the at least one word boundary count and non-word boundary counts with each query sequence key.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for determining word boundary likelihoods in potentially incomplete text. In one aspect, a method includes selecting query sequences from the query, each query sequence being at least a portion of a word n-gram, the word n-gram being a subsequence of up to n words selected from the second sequence of words of the query, and for each query sequence: determining one or more query sequence keys for the query sequence; determining at least one of a word boundary count and a non-word boundary count for each query sequence key, each word-boundary count and non-word boundary count being dependent on the context of the query sequence; and associating, in a data storage device, the at least one word boundary count and non-word boundary counts with each query sequence key.
225 Citations
26 Claims
-
1. A system comprising:
-
a data processing apparatus; and a computer storage medium encoded with a computer program, the program comprising data processing apparatus instructions that when executed by the data processing apparatus cause the data processing apparatus to perform operations comprising; accessing queries stored in query logs, each query being one or more characters in a first sequence constituting one or more words in a second sequence; for each query; selecting query sequences from the query, each query sequence being at least a portion of a word n-gram, the word n-gram being a subsequence of up to n words selected from the second sequence of words of the query, and for each selected query sequence; determining one or more query sequence keys for the query sequence; determining at least one of a word boundary count and a non-word boundary count for each query sequence key, each word-boundary count and non-word boundary count being dependent on the context of the query sequence; and associating, in a data storage device, the at least one word boundary count and non-word boundary counts with each query sequence key. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A system comprising:
-
a data processing apparatus; and a computer storage medium encoded with a computer program, the program comprising data processing apparatus instructions that when executed by the data processing apparatus cause the data processing apparatus to perform operations comprising; accessing queries stored in query logs, each query being one or more characters in a first sequence constituting one or more words in a second sequence; for each query; selecting query sequences from the query, each query sequence being at least a portion of a word n-gram, the word n-gram being a subsequence of up to n words selected from the second sequence of words of the query, and for each query sequence; determining one or more query sequence keys for the query sequence; determining at least one of a word boundary count and a non-word boundary count for each query sequence key, each word-boundary count and non-word boundary count being dependent on the context of the query sequence; for each query sequence key; determining a likelihood that the query sequence for which the query sequence key is determined occurs a word boundary from the word boundary count and non-word boundary count associated with a query sequence key; and associating, in a data storage device, the likelihood with the query sequence key. - View Dependent Claims (20)
-
-
21. A method performed by a data processing apparatus, comprising:
-
accessing queries stored in query logs, each query being one or more characters in a first sequence constituting one or more words in a second sequence; for each query; selecting query sequences from the query, each query sequence being at least a portion of a word n-gram, the word n-gram being a subsequence of up to n words selected from the second sequence of words of the query, and for each selected query sequence; determining one or more query sequence keys for the query sequence; determining at least one of a word boundary count and a non-word boundary count for each query sequence key, each word-boundary count and non-word boundary count being dependent on the context of the query sequence; and associating, in a data storage device, the at least one word boundary count and non-word boundary counts with each query sequence key.
-
-
22. A system comprising:
-
a data processing apparatus; and a computer storage medium encoded with a computer program, the program comprising data processing apparatus instructions that when executed by the data processing apparatus cause the data processing apparatus to perform operations comprising; providing to a client device a search resource including interface instructions that cause the client device to generate a search interface that includes a query input field; receiving a query input from a client device, the query input having been input into the query input field and being one or more characters in a first input sequence constituting one or more words in a second input sequence; in response to receiving data indicating a determination that the query input received from a client device does not meet a query suggestion threshold; selecting a query input sequence from the query input, the query input sequence being up to a word n-gram of the most subsequent words of the one or more words in the second input sequence; determining a likelihood that the query input sequence terminates at a word boundary from a word boundary count and a non-word boundary count associated with a query sequence key matching the query input sequence; and providing search results responsive to the client device at the expiration of a providing time delay that is based on the determined likelihood. - View Dependent Claims (23, 24, 25)
-
-
26. A method performed by a data processing apparatus, comprising:
-
providing to a client device a search resource including interface instructions that cause the client device to generate a search interface that includes a query input field; receiving a query input from a client device, the query input having been input into the query input field and being one or more characters in a first input sequence constituting one or more words in a second input sequence; in response to receiving data indicating a determination that the query input received from a client device does not meet a query suggestion threshold; selecting a query input sequence from the query input, the query input sequence being up to a word n-gram of the most subsequent words of the one or more words in the second input sequence; determining a likelihood that the query input sequence terminates at a word boundary from a word boundary count and a non-word boundary count associated with a query sequence key matching the query input sequence; and providing search results responsive to the client device at the expiration of a providing time delay that is based on the determined likelihood.
-
Specification