Determining word boundary likelihoods in potentially incomplete text

US 8,930,399 B1
Filed: 01/11/2013
Issued: 01/06/2015
Est. Priority Date: 11/22/2010
Status: Active Grant

First Claim

Patent Images

1. A computer implemented method, comprising:

accessing stored queries, each query being one or more characters in a first sequence constituting one or more words in a second sequence;

for each query;

selecting query sequences from the query, each query sequence being at least a portion of a word n-gram, the word n-gram being a subsequence of up to n words selected from the second sequence of words of the query, and for each selected query sequence;

determining a query sequence key for the selected query sequence;

determining a word boundary likelihood that represents a likelihood that the selected query sequence terminates at a word boundary, the word boundary likelihood being based on a second likelihood that query sequences that are the same as the selected query sequence are one of an end portion of a completed query or a portion of a query sequence that includes a space character as a next character, wherein the second likelihood is based on a word boundary count for the query sequence, the word boundary count being based on a number of the queries for which the query sequence includes a space character as a next character; and

associating, in a data storage device, the word boundary likelihood with the query sequence key.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for determining word boundary likelihoods in potentially incomplete text. In one aspect, a method includes selecting query sequences from the query, each query sequence being at least a portion of a word n-gram, the word n-gram being a subsequence of up to n words selected from the second sequence of words of the query, and for each query sequence: determining one or more query sequence keys for the query sequence; determining at least one of a word boundary count and a non-word boundary count for each query sequence key, each word-boundary count and non-word boundary count being dependent on the context of the query sequence; and associating, in a data storage device, the at least one word boundary count and non-word boundary counts with each query sequence key.

15 Citations

View as Search Results

20 Claims

1. A computer implemented method, comprising:
- accessing stored queries, each query being one or more characters in a first sequence constituting one or more words in a second sequence;
  
  for each query;
  
  selecting query sequences from the query, each query sequence being at least a portion of a word n-gram, the word n-gram being a subsequence of up to n words selected from the second sequence of words of the query, and for each selected query sequence;
  
  determining a query sequence key for the selected query sequence;
  
  determining a word boundary likelihood that represents a likelihood that the selected query sequence terminates at a word boundary, the word boundary likelihood being based on a second likelihood that query sequences that are the same as the selected query sequence are one of an end portion of a completed query or a portion of a query sequence that includes a space character as a next character, wherein the second likelihood is based on a word boundary count for the query sequence, the word boundary count being based on a number of the queries for which the query sequence includes a space character as a next character; and
  
  associating, in a data storage device, the word boundary likelihood with the query sequence key.
- View Dependent Claims (2, 3, 5, 6)
- - 2. The method of claim 1, wherein the word boundary likelihood is further based on a likelihood that query sequences that are the same as the selected query sequence include a letter character as a next character.
  - 3. The method of claim 2, further comprising determining the likelihood that query sequences that are the same as the selected query sequence includes a letter character as a next character by determining a non-word boundary count for the query sequence, the non-word boundary count for the query sequence being based on a number of the queries for which the query sequence includes a letter character as a next character.
  - 5. The method of claim 1, wherein selecting a query sequence from the query comprises:
    - selecting a next character in the first sequence of characters, the next character being either a first character of the first sequence of characters or a character that is next in sequence to a most recently selected next character in the first sequence of characters;
      
      determining whether the query sequence constitutes more than a subsequence of up to n words from the second sequence of words of the query; and
      
      in response to determining that the query sequence constitutes more than the subsequence of up to n words, deselecting a word that is first in the subsequence of words in the query sequence.
  - 6. The method of claim 5, wherein determining one or more query sequence keys for the query sequence comprises:
    - determining a first query sequence key that is a key for the entire query sequence;
      
      determining a second query sequence key that is a key for a last subsequence of characters in the query sequence that constitute only a unigram or only a portion of a unigram in the query.

4. A computer implemented method, comprising:
- accessing stored queries, each query being one or more characters in a first sequence constituting one or more words in a second sequence;
  
  for each query;
  
  selecting query sequences from the query, each query sequence being at least a portion of a word n-gram, the word n-gram being a subsequence of up to n words selected from the second sequence of words of the query, and for each selected query sequence;
  
  determining a query sequence key for the selected query sequence;
  
  determining a word boundary likelihood that represents a likelihood that the selected query sequence terminates at a word boundary, the word boundary likelihood being based on a second likelihood that query sequences that are the same as the selected query sequence are one of an end portion of a completed query or a portion of a query sequence that includes a space character as a next character, wherein the second likelihood is based on a word boundary count for the query sequence, the word boundary count being based on a number of the queries for which the query sequence is an end portion of the query; and
  
  associating, in a data storage device, the likelihood with the query sequence key.

7. A system, comprising:
- a data processing apparatus; and
  
  a computer storage medium encoded with a computer program, the program comprising data processing apparatus instructions that when executed by the data processing apparatus cause the data processing apparatus to perform operations comprising;
  
  accessing queries stored queries, each query being one or more characters in a first sequence constituting one or more words in a second sequence;
  
  for each query;
  
  selecting query sequences from the query, each query sequence being at least a portion of a word n-gram, the word n-gram being a subsequence of up to n words selected from the second sequence of words of the query, and for each selected query sequence;
  
  determining a query sequence key for the selected query sequence;
  
  determining a word boundary likelihood that represents a likelihood that the selected query sequence terminates at a word boundary, the word boundary likelihood being based on a second likelihood that query sequences that are the same as the selected query sequence are one of an end portion of a completed query or a portion of a query sequence that includes a space character as a next character, wherein the second likelihood is based on a word boundary count for the query sequence, the word boundary count being based on a number of the queries for which the query sequence includes a space character as a next character; and
  
  associating, in a data storage device, the likelihood with the query sequence key.
- View Dependent Claims (8, 9)
- - 8. The system of claim 7, wherein the word boundary likelihood is further based on a likelihood that query sequences that are the same as the selected query sequence include a letter character as a next character.
  - 9. The system of claim 8, wherein the data processing apparatus instructions cause the data processing apparatus to perform further operations comprising determining the likelihood that query sequences that are the same as the selected query sequence includes a letter character as a next character by determining a non-word boundary count for the query sequence, the non-word boundary count for the query sequence being based on a number of the queries for which the query sequence includes a letter character as a next character.

10. A system comprising:
- a data processing apparatus; and
  
  a computer storage medium encoded with a computer program, the program comprising data processing apparatus instructions that when executed by the data processing apparatus cause the data processing apparatus to perform operations comprising;
  
  providing to a client device a search resource including interface instructions that cause the client device to generate a search interface that includes a query input field;
  
  receiving a query input from a client device, the query input having been input into the query input field and being one or more characters in a first input sequence constituting one or more words in a second input sequence;
  
  selecting a query input sequence from the query input, the query input sequence being up to a word n-gram, the word n-gram being a subsequence of up to n words selected from the most subsequent words of the one or more words in the second input sequence;
  
  determining a likelihood that the query input sequence terminates at a word boundary from a word boundary count and a non-word boundary count associated with a query sequence key matching the query input sequence; and
  
  providing, to the client device, search results responsive to the query input based on the determined likelihood.
- View Dependent Claims (11, 12, 13, 14, 15)
- - 11. The system of claim 10, wherein providing search results responsive to the query input based on the determined likelihood comprises:
    - identifying a time delay based on the determined likelihood;
      
      determining that the time delay has expired before another query input is received; and
      
      providing the search results to the client device in response to determining that the time delay has expired.
  - 12. The system of claim 11, wherein the time delay is inversely proportional to the determined likelihood.
  - 13. The system of claim 11, wherein identifying a time delay based on the determined likelihood comprises:
    - determining that the query input ends with a word indicative of additional query input; and
      
      lengthening the time delay in response to determining that the query input ends with a word indicative of additional query input.
  - 14. The system of claim 11, wherein identifying a time delay based on the determined likelihood comprises:
    - determining that the query input ends with a word indicative of an end of a query; and
      
      shortening the time delay in response to determining that the query input ends with a word indicative of an end of a query.
  - 15. The system of claim 10, wherein providing search results responsive to the query input based on the determined likelihood comprises:
    - determining that the determined likelihood exceeds a threshold; and
      
      providing the search results to the client device without a time delay in response to determining that the determined likelihood exceeds the threshold.

16. A method comprising:
- providing to a client device a search resource including interface instructions that cause the client device to generate a search interface that includes a query input field;
  
  receiving a query input from a client device, the query input having been input into the query input field and being one or more characters in a first input sequence constituting one or more words in a second input sequence;
  
  selecting a query input sequence from the query input, the query input sequence being up to a word n-gram, the word n-gram being a subsequence of up to n words selected from the most subsequent words of the one or more words in the second input sequence;
  
  determining a likelihood that the query input sequence terminates at a word boundary from a word boundary count and a non-word boundary count associated with a query sequence key matching the query input sequence; and
  
  providing, to the client device, search results responsive to the query input based on the determined likelihood.
- View Dependent Claims (17, 18, 19)
- - 17. The method of claim 16, wherein providing search results responsive to the query input based on the determined likelihood comprises:
    - identifying a time delay based on the determined likelihood;
      
      determining that the time delay has expired before another query input is received; and
      
      providing the search results to the client device in response to determining that the time delay has expired.
  - 18. The method of claim 16, wherein providing search results responsive to the query input based on the determined likelihood comprises:
    - determining that the determined likelihood exceeds a threshold; and
      
      providing the search results to the client device without a time delay in response to determining that the determined likelihood exceeds the threshold.
  - 19. The method of claim 17, wherein the time delay is inversely proportional to the determined likelihood.

20. A computer storage medium encoded with a computer program, the program comprising instructions that when executed by one or more data processing apparatus cause the data processing apparatus to perform operations comprising:
- providing to a client device a search resource including interface instructions that cause the client device to generate a search interface that includes a query input field;
  
  receiving a query input from a client device, the query input having been input into the query input field and being one or more characters in a first input sequence constituting one or more words in a second input sequence;
  
  selecting a query input sequence from the query input, the query input sequence being up to a word n-gram, the word n-gram being a subsequence of up to n words selected from the most subsequent words of the one or more words in the second input sequence;
  
  determining a likelihood that the query input sequence terminates at a word boundary from a word boundary count and a non-word boundary count associated with a query sequence key matching the query input sequence; and
  
  providing, to the client device, search results responsive to the query input based on the determined likelihood.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Das, Abhinandan S., Fung, Harry S.
Primary Examiner(s)
Hu, Jensen

Application Number

US13/739,591
Time in Patent Office

725 Days
Field of Search

707/780, 707/759
US Class Current

707/780
CPC Class Codes

G06F 16/2468   Fuzzy queries

G06F 16/9032   Query formulation

G06F 16/90324   using system suggestions

G06F 16/90335   Query processing

Determining word boundary likelihoods in potentially incomplete text

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

15 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Determining word boundary likelihoods in potentially incomplete text

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

15 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links