Suggesting and refining user input based on original user input

US 8,438,142 B2
Filed: 05/04/2005
Issued: 05/07/2013
Est. Priority Date: 05/04/2005
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method, comprising:

receiving an original query;

generating a first feature vector for a first term in the original query;

generating a respective feature vector for each of one or more different terms in a collection of terms;

associating a respective similarity value with each of the one or more different terms, wherein the similarity value is based at least in part on a similarity measure between the first feature vector for the first term and a respective feature vector for each of the one or more different terms;

identifying one or more similar terms from the one or more different terms based on the respective similarity values associated with each of the one or more different terms;

generating an alternative query for each of the one or more identified similar terms by substituting the first term in the original query with a respective identified similar term;

computing a score for each alternative query based on the similarity value associated with an identified similar term in the respective alternative query; and

identifying one or more of the alternative queries as a query suggestion for the original query based at least in part on the computed score for each alternative query.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and methods to generate modified/refined user inputs based on the original user input, such as a search query, are disclosed. The method may be implemented for Roman-based and/or non-Roman based language such as Chinese. The method may generally include receiving an original user input and identifying core terms therein, determining potential alternative inputs by replacing core term(s) in the original input with another term according to a similarity matrix and/or substituting a word sequence in the original input with another word sequence according to an expansion/contraction table where one word sequence is a substring of the other, computing likelihood of each potential alternative input, and selecting most likely alternative inputs according to a predetermined criteria, e.g., likelihood of the alternative input being at least that of the original input. A cache containing pre-computed original user inputs and corresponding alternative inputs may be provided.

68 Citations

View as Search Results

28 Claims

1. A computer-implemented method, comprising:
- receiving an original query;
  
  generating a first feature vector for a first term in the original query;
  
  generating a respective feature vector for each of one or more different terms in a collection of terms;
  
  associating a respective similarity value with each of the one or more different terms, wherein the similarity value is based at least in part on a similarity measure between the first feature vector for the first term and a respective feature vector for each of the one or more different terms;
  
  identifying one or more similar terms from the one or more different terms based on the respective similarity values associated with each of the one or more different terms;
  
  generating an alternative query for each of the one or more identified similar terms by substituting the first term in the original query with a respective identified similar term;
  
  computing a score for each alternative query based on the similarity value associated with an identified similar term in the respective alternative query; and
  
  identifying one or more of the alternative queries as a query suggestion for the original query based at least in part on the computed score for each alternative query.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The computer-implemented method of claim 1, wherein the original query is a search query.
  - 3. The computer-implemented method of claim 1, wherein the original query is in a non-Roman based language.
  - 4. The computer-implemented method of claim 1, further comprising:
    - storing the original query and the one or more alternative queries in a cache.
  - 5. The computer-implemented method of claim 1, wherein the similarity measures are stored in a similarity matrix, and the similarity matrix is generated by generating feature vectors for respective terms in a collection of terms in at least one of a corpus, a user input log, and user session data, and determining respective similarity measures between pairs of the terms in the collection of terms using corresponding feature vectors.
  - 6. The computer-implemented method of claim 1, wherein the score is calculated by determining at least one of:
    - (a) a relevance between the original query and a first alternative query, (b) a probability that the first alternative query will be selected, or (c) a score of the position of a selected search result for the first alternative query.
  - 7. The computer-implemented method of claim 6, wherein the determining includes determining the relevance between the original query and the first alternative query and determining the relevance includes:
    - aligning terms of the original query with terms of the first alternative query; and
      
      determining correlation values between the aligned terms.

8. A system, comprising:
- a server device configured to receive an original query and to perform operations including;
  
  generating a first feature vector for a first term in the original query;
  
  generating a respective feature vector for each of one or more different terms in a collection of terms;
  
  associating a respective similarity value with each of the one or more different terms, wherein the similarity value is based at least in part on a similarity measure between the first feature vector for the first term and a respective feature vector for each of the one or more different terms;
  
  identifying one or more similar terms from the one or more different terms based on the respective similarity values associated with each of the one or more different terms;
  
  generating an alternative query for each of the one or more identified similar terms by substituting the first term in the original query with a respective identified similar term;
  
  computing a score for each alternative query based on the similarity value associated with an identified similar term in the respective alternative query; and
  
  identifying one or more of the alternative queries as a query suggestion for the original query based at least in part on the computed score for each alternative query.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The system of claim 8, wherein the original query is a search query.
  - 10. The system of claim 8, wherein the original query is in a non-Roman based language.
  - 11. The system of claim 8, further comprising a pre-computed cache of the one or more alternative queries, wherein the server device is further configured to determine whether the original query is in the pre-computed cache and, upon determining that the original query is in the pre-computed cache, to output at least one pre-computed alternative query.
  - 12. The system of claim 8, wherein the similarity measures are stored in a similarity matrix, and the server device is further configured to generate the similarity matrix by generating feature vectors for respective terms in a collection of terms in at least one of a corpus, a user input log, and user session data, and determining respective similarity measures between pairs of the terms in the collection of terms using corresponding feature vectors.
  - 13. The system of claim 8, wherein the server device is further configured to compute the score by determining at least one of:
    - (a) a relevance between the original query and a first alternative query, (b) a probability that the first alternative query will be selected, or (c) a score of the position of a selected search result for the first alternative query.
  - 14. The system of claim 13, wherein the server device is further configured to determine the relevance between the original query and the first alternative query, and wherein determining the relevance includes:
    - aligning terms of the original query with terms of the first alternative query; and
      
      determining correlation values between the aligned terms.

15. A computer program product for use in conjunction with a computer system, the computer program product comprising a computer readable storage device on which are stored instructions executable on a computer processor, the instructions comprising:
- receiving an original query;
  
  generating a first feature vector for a first term in the original query;
  
  generating a respective feature vector for each of one or more different terms in a collection of terms;
  
  associating a respective similarity value with each of the one or more different terms, wherein the similarity value is based at least in part on a similarity measure between the first feature vector for the first term and a respective feature vector for each of the one or more different terms;
  
  identifying one or more similar terms from the one or more different terms based on the respective similarity values associated with each of the one or more different terms;
  
  generating an alternative query for each of the one or more identified similar terms by substituting the first term in the original query with a respective identified similar term;
  
  computing a score for each alternative query based on the similarity value associated with an identified similar term in the respective alternative query; and
  
  identifying one or more of the alternative queries as a query suggestion for the original query based at least in part on the computed score for each alternative query.
- View Dependent Claims (16, 17, 18, 19)
- - 16. The computer program product of claim 15, the instructions further comprising:
    - storing the original query and the one or more alternative queries in a cache.
  - 17. The computer program product of claim 15, wherein the similarity measures are stored in a similarity matrix, and the similarity matrix is generated by generating feature vectors for respective terms in a collection of terms in at least one of a corpus, a user input log, and user session data, and determining respective similarity measures between pairs of the terms in the collection of terms using corresponding feature vectors.
  - 18. The computer program product of claim 15, wherein the score is calculated by determining at least one of:
    - (a) a relevance between the original query and a first alternative query, (b) a probability that the first alternative query will be selected, or (c) a score of the position of a selected search result for the first alternative query.
  - 19. The computer program product of claim 18, wherein the instructions include determining the relevance between the original query and the first alternative query, and wherein determining the relevance includes:
    - aligning terms of the original query with terms of the first alternative query; and
      
      determining correlation values between the aligned terms.

20. A computer-implemented method, comprising:
- receiving an original query;
  
  identifying a first compound comprising a first sequence of one or more terms in the original query;
  
  identifying a second compound comprising a second different sequence of one or more terms, wherein the second compound is an expansion or a contraction of the first compound;
  
  generating an alternative query by substituting the first compound in the original query with the second compound identified as an expansion or a contraction of the first compound;
  
  computing a score for the alternative query based at least in part on a relevance between the alternative query and a history of one or more previously received queries; and
  
  identifying the alternative query as a query suggestion for the original query based at least in part on the computed score for the alternative query.
- View Dependent Claims (21, 22)
- - 21. The computer-implemented method of claim 20, wherein the first compound and the second compound are stored in an expansion/contraction table generated from at least one of a user input log and a user input database, and wherein the expansion/contraction table includes frequency values representing occurrences of sequences of words.
  - 22. The computer-implemented method of claim 21, wherein the expansion/contraction table is generated by determining frequent word sequences, filtering out non-phrasal word sequences, and associating counts with sequences of terms as the frequency values.

23. A computer program product for use in conjunction with a computer system, the computer program product comprising a computer readable storage medium on which are stored instructions executable on a computer processor, the instructions including:
- receiving an original query;
  
  identifying a first compound comprising a first sequence of one or more terms in the original query;
  
  identifying a second compound comprising a second different sequence of one or more terms, wherein the second compound is an expansion or a contraction of the first compound;
  
  generating an alternative query by substituting the first compound in the original query with the second compound identified as an expansion or a contraction of the first compound;
  
  computing a score for the alternative query based at least in part on a relevance between the alternative query and a history of one or more previously received queries; and
  
  identifying the alternative query as a query suggestion for the original query based at least in part on the computed score for the alternative query.
- View Dependent Claims (27, 28)
- - 27. The computer program product of claim 23, wherein the first compound and the second compound are stored in an expansion/contraction table generated from at least one of a user input log and a user input database, and wherein the expansion/contraction table includes frequency values representing occurrences of sequences of words.
  - 28. The computer program product of claim 27, wherein the expansion/contraction table is generated by determining frequent word sequences, filtering out non-phrasal word sequences, and associating counts with sequences of terms as the frequency values.

24. A system, comprising:
- a server configured to receive an original query and to perform operations including;
  
  identifying a first compound comprising a first sequence of one or more terms in the original query;
  
  identifying a second compound comprising a second different sequence of one or more terms, wherein the second compound is an expansion or a contraction of the first compound;
  
  generating an alternative query by substituting the first compound in the original query with the second compound identified as an expansion or a contraction of the first compound;
  
  computing a score for the alternative query based at least in part on a relevance between the alternative query and a history of one or more previously received queries; and
  
  identifying the alternative query as a query suggestion for the original query based at least in part on the computed score for the alternative query.
- View Dependent Claims (25, 26)
- - 25. The system of claim 24, wherein the first compound and the second compound are stored in an expansion/contraction table generated from at least one of a user input log and a user input database, and wherein the expansion/contraction table includes frequency values representing occurrences of sequences of words.
  - 26. The system of claim 25, wherein the expansion/contraction table is generated by determining frequent word sequences, filtering out non-phrasal word sequences, and associating counts with sequences of terms as the frequency values.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Wu, Jun, Lin, Dekang, Qian, Zhe, Zhou, Jie
Primary Examiner(s)
Ruiz, Angelica

Application Number

US11/122,873
Publication Number

US 20060253427A1
Time in Patent Office

2,925 Days
Field of Search

707/3, 707/4, 707/5, 707/6, 707/7, 707/8, 707/9, 707/10
US Class Current

707/705
CPC Class Codes

G06F 16/242   Query formulation

G06F 16/24578   using ranking

G06F 16/3322   using system suggestions G0...

G06F 16/90324   using system suggestions

Suggesting and refining user input based on original user input

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

68 Citations

28 Claims

Specification

Solutions

Use Cases

Quick Links

Suggesting and refining user input based on original user input

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

68 Citations

28 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links