Usage based query response

US 8,428,948 B1
Filed: 07/20/2010
Issued: 04/23/2013
Est. Priority Date: 12/17/2009
Status: Active Grant

First Claim

Patent Images

1. A system comprising:

(a) a database storing a plurality of records;

(b) a computer readable medium storing data comprising a dictionary list comprising a plurality of word groups identified as corresponding to an invented part of speech;

(c) a computer configured via a set of data to perform a set of tasks comprising;

(i) receiving an input string, the input string comprising a plurality of words comprising an input word group corresponding to the invented part of speech;

(ii) calculating a set of part of speech scores for the input string wherein calculating the set of part of speech scores comprises, for each word group from a set of word groups from the dictionary list, calculating a measure of similarity between;

(1) the input word group corresponding to the invented part of speech; and

(2) the word group from the set of word groups from the dictionary list for which the part of speech score is being calculated;

(iii) determining a result set comprising a set of records retrieved from the database, wherein each record from the result set comprises an identifying word group corresponding to the invented part of speech; and

(iv) for each record in a subset of records from the result set, determining a match score based on a relevant part of speech score from the previously calculated set of part of speech scores, wherein the relevant part of speech score corresponds to the identifying word group from the record in the subset of records for which the match score is being determined; and

wherein the cardinality of the subset of records is less than or equal to the cardinality of the set of records from the result set;

wherein;

(A) each record from the plurality of records stored in the database corresponds to a class from a plurality of classes;

(B) the data stored on the computer readable medium further comprises a class probability index, wherein the class probability index comprises, for each class in a subset of the plurality of classes;

(i) general probability data that words appear in records corresponding to the class; and

(ii) specialized probability data that word groups in the dictionary list are used as the invented part of speech in records corresponding to the class;

(C) calculating a set of class scores for the input string, wherein the set of class scores comprises, for each class from the subset of the plurality of classes, a probability that the input string corresponds to that class;

(D) each record from the result set corresponds to a class from the subset of the plurality of classes; and

(E) the match score for each record from the subset of records from the result set is further based on a relevant class score from the previously calculated set of class scores, wherein the relevant class score corresponds to the class corresponding to the record from the subset of records for which the match score is being determined;

wherein the cardinality of the subset of the plurality of classes is less than or equal to the cardinality of the plurality of classes.

View all claims

16 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

It is possible to provide meaningful responses to queries using systems which consider usage of words in the queries when analyzing those queries and determining what information is possibly relevant. This approach can be applied in online shopping systems by identification of nouns or noun phrases reflecting products available through the system.

37 Citations

View as Search Results

15 Claims

1. A system comprising:
- (a) a database storing a plurality of records;
  
  (b) a computer readable medium storing data comprising a dictionary list comprising a plurality of word groups identified as corresponding to an invented part of speech;
  
  (c) a computer configured via a set of data to perform a set of tasks comprising;
  
  (i) receiving an input string, the input string comprising a plurality of words comprising an input word group corresponding to the invented part of speech;
  
  (ii) calculating a set of part of speech scores for the input string wherein calculating the set of part of speech scores comprises, for each word group from a set of word groups from the dictionary list, calculating a measure of similarity between;
  
  (1) the input word group corresponding to the invented part of speech; and
  
  (2) the word group from the set of word groups from the dictionary list for which the part of speech score is being calculated;
  
  (iii) determining a result set comprising a set of records retrieved from the database, wherein each record from the result set comprises an identifying word group corresponding to the invented part of speech; and
  
  (iv) for each record in a subset of records from the result set, determining a match score based on a relevant part of speech score from the previously calculated set of part of speech scores, wherein the relevant part of speech score corresponds to the identifying word group from the record in the subset of records for which the match score is being determined; and
  
  wherein the cardinality of the subset of records is less than or equal to the cardinality of the set of records from the result set;
  
  wherein;
  
  (A) each record from the plurality of records stored in the database corresponds to a class from a plurality of classes;
  
  (B) the data stored on the computer readable medium further comprises a class probability index, wherein the class probability index comprises, for each class in a subset of the plurality of classes;
  
  (i) general probability data that words appear in records corresponding to the class; and
  
  (ii) specialized probability data that word groups in the dictionary list are used as the invented part of speech in records corresponding to the class;
  
  (C) calculating a set of class scores for the input string, wherein the set of class scores comprises, for each class from the subset of the plurality of classes, a probability that the input string corresponds to that class;
  
  (D) each record from the result set corresponds to a class from the subset of the plurality of classes; and
  
  (E) the match score for each record from the subset of records from the result set is further based on a relevant class score from the previously calculated set of class scores, wherein the relevant class score corresponds to the class corresponding to the record from the subset of records for which the match score is being determined;
  
  wherein the cardinality of the subset of the plurality of classes is less than or equal to the cardinality of the plurality of classes.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The system of claim 1, wherein calculating the set of class scores for the input string comprises, for each class in the subset of the plurality of classes, calculating a corresponding class score using an equation, wherein using the equation comprises calculating:
    - P_combined(class|input)=W_class*P_invented(class|input)+(1−
      
      W_class)*P_non_—_invented(class|input)wherein;
      
      (a) P_invented(class|input) is a specialized probability that the input word group corresponding to the invented part of speech is used as the invented part of speech in records corresponding to the class;
      
      (b) P_non_—_invented(class|input) is a general probability that the words from the input string appear in records corresponding to the class;
      
      (c) P_combined(class|input) is the corresponding class score, representing a probability that the input string corresponds to the class; and
      
      (d) W_classis a weight reflecting confidence in the values of P_invented(class|input) and P_non_—_invented(class|input) for the class.
  - 3. The system of claim 2, wherein P_combined(class|input) for a first class from the subset of the plurality of classes is calculated using a different W_classvalue than is used to calculate P_combined(class|input) for a second class from the subset of the plurality of classes.
  - 4. The system of claim 1, wherein calculating the set of class scores for the input string comprises:
    - (a) if there is at least one class in the plurality of classes for which a probability that the input word group corresponding to the invented part of speech is used as the invented part of speech in records corresponding to that class is greater than zero then, for class in the subset of the plurality of classes, calculating a corresponding class score using an equation wherein using the equation comprises calculating;
      
      P_combined(class|input)=W_class*P_invented(class|input)+(1−
      
      W_class)*P_non_—_invented(class|input)wherein;
      
      P_invented(class|input) is a specialized probability that the input word group corresponding to the invented part of speech is used as the invented part of speech in records corresponding to the class;
      
      (ii) P_non_—_invented(class|input) is a general probability that the words from the input string appear in records corresponding to the class;
      
      (iii) P_combined(class|input) is the corresponding class score, representing a probability that the input string corresponds to the class; and
      
      (iv) W_classis a weight reflecting confidence in the values of P_invented(class|input) and P_non_—_invented(class|input) for the class;
      
      (b) otherwise, if there is no class in the plurality of classes for which the probability that the input word group corresponding to the invented part of speech is used as the invented part of speech in records corresponding to that class is greater than zero then, for class in the subset of the plurality of classes, calculating the corresponding class score comprises calculating the class score based on P_non_—_invented(class|input), without considering whether the input word group is used as the invented part of speech in records corresponding to the class.
  - 5. The system of claim 1 wherein determining the match score for each record in the subset of records from the result set comprises combining a set of scoring data comprising:
    - (a) the relevant class score corresponding to the record;
      
      (b) the relevant part of speech score corresponding to the record; and
      
      (c) a baseline score corresponding to the record;
      
      wherein combining the set of scoring data comprises weighting the relevant part of speech score corresponding to the record by multiplying the relevant part of speech score corresponding to the record by a weighting factor.

6. A system comprising:
- (a) a database storing a plurality of records;
  
  (b) a computer readable medium storing data comprising a dictionary list comprising a plurality of word groups identified as corresponding to an invented part of speech;
  
  (c) a computer configured via a set of data to perform a set of tasks comprising;
  
  (i) receiving an input string, the input string comprising a plurality of words comprising an input word group corresponding to the invented part of speech;
  
  (ii) calculating a set of part of speech scores for the input string, wherein calculating the set of part of speech scores comprises, for each word group from a set of word groups from the dictionary list, calculating a measure of similarity between;
  
  (1) the input word group corresponding to the invented part of speech; and
  
  (2) the word group from the set of word groups from the dictionary list for which the part of speech score is being calculated; and
  
  (iii) determining a result set comprising a set of records retrieved from the database, wherein each record from the result set comprises an identifying word group corresponding to the invented part of speech; and
  
  (iv) for each record in a subset of records from the result set, determining a match score based on a relevant part of speech score from the previously calculated set of part of speech scores, wherein the relevant part of speech score corresponds to the identifying word group from the record in the subset of records for which the match score is being determined; and
  
  wherein the cardinality of the subset of records is less than or equal to the cardinality of the set of records from the result set;
  
  wherein calculating the set of part of speech scores for the input string comprises, for each word group from the set of word groups from the dictionary list, calculating a cosine similarity measure between;
  
  (A) a first word group corresponding to the input word group corresponding to the invented part of speech; and
  
  (B) a second word group corresponding to the word group from the set of word groups from the dictionary list;
  
  wherein the cosine similarity measure is weighted by exponentially increasing the weight given to the similarity of words as those words approach the end of the first word group and the second word group.
- View Dependent Claims (7, 8)
- - 7. The system of claim 6, wherein calculating the set of part of speech scores for the input string comprises:
    - (a) defining the first word group by reordering the input word group by moving any grouping words from the input word group to the beginning of the input word group;
      
      (b) defining the second word group by reordering the word group from the set of word groups from the dictionary list by moving any grouping words from the word group from the set of word groups from the dictionary list to the beginning of the word group from the set of word groups from the dictionary list.
  - 8. The system of claim 6, wherein calculating the set of part of speech scores for the input string comprises:
    - (a) defining the first word group by deleting any grouping words from the input word group; and
      
      (b) defining the second word group by deleting any grouping words from the word group from the set of word groups from the dictionary list.

9. A method comprising:
- (a) receiving an input string, the input string comprising a plurality of words comprising an input word group corresponding to an invented part of speech;
  
  (b) calculating via a computer a set of part of speech scores for the input string, wherein calculating the set of part of speech scores comprises, for each word group from a set of word groups from a dictionary list stored in advance on a computer readable medium, calculating a measure of similarity between;
  
  (i) the input word group corresponding to the invented part of speech; and
  
  (ii) the word group from the set of word groups from the dictionary list for which the part of speech score is being calculated;
  
  wherein the dictionary list comprises a plurality of word groups identified as corresponding to the invented part of speech;
  
  (c) determining a result set comprising a set of records retrieved from a database wherein each record from the result set comprises an identifying word group corresponding to the invented part of speech; and
  
  (d) for each record in a subset of records from the result set, determining via the computer a match score based on a relevant part of speech score from the previously calculated set of part of speech scores, wherein the relevant part of speech score corresponds to the identifying word group from the record in the subset of records for which the match score is being determined; and
  
  wherein the cardinality of the subset of records is less than or equal to the cardinality of the set of records from the result set;
  
  wherein;
  
  (A) each record stored in the database corresponds to a class from a plurality of classes;
  
  (B) in addition to storing the dictionary list, the computer readable medium also has stored therein a class probability index, wherein the class probability index comprises, for each class in a subset of the plurality of classes;
  
  (i) general probability data that words appear in records corresponding to the class; and
  
  (ii) specialized probability data that word groups in the dictionary list are used as the invented part of speech in records corresponding to the class;
  
  (C) the method further comprises calculating a set of class scores for the input string, wherein the set of class scores comprises, for each class from the subset of the plurality of classes, a probability that the input string corresponds to that class;
  
  (D) each record from the result set corresponds to a class from the subset of the plurality of classes; and
  
  (E) the match score for each record from the subset of records from the result set is further based on a relevant class score from the previously calculated set of class scores, wherein the relevant class score corresponds to the class corresponding to the record from the subset of records for which the match score is being determined;
  
  wherein the cardinality of the subset of the plurality of classes is less than or equal to the cardinality of the plurality of classes.
- View Dependent Claims (10, 11, 12)
- - 10. The method of claim 9, wherein calculating the set of class scores for the input string comprises, for each class in the subset of the plurality of classes, calculating a corresponding class score using an equation, wherein using the equation comprises calculating:
    - P_combined(class|input)=W_class*P_invented(class|input)+(1−
      
      W_class)*P_non_—_invented(class|input)wherein;
      
      (a) P_invented(class|input) is a specialized probability that the input word group corresponding to the invented part of speech is used as the invented part of speech in records corresponding to the class;
      
      (b) P_non_—_invented(class|input) is a general probability that the words from the input string appear in records corresponding to the class;
      
      (c) P_combined(class|input) is the corresponding class score, representing a probability that the input string corresponds to the class; and
      
      (d) W_classis a weight reflecting confidence in the values of P_invented(class|input) and P_non_—_invented(class|input) for the class.
  - 11. The method of claim 9, wherein calculating the set of class scores for the input string comprises:
    - (a) if there is at least one class in the plurality of classes for which a probability that the input word group corresponding to the invented part of speech is used as the invented part of speech in records corresponding to that class is greater than zero then, for class in the subset of the plurality of classes, calculating a corresponding class score using an equation wherein using the equation comprises calculating;
      
      P_combined(class|input)=W_class*P_invented(class input)+(1−
      
      W_class)*P_non_—_invented(class|input)wherein;
      
      (i) P_invented(class|input) is a specialized probability that the input word group corresponding to the invented part of speech is used as the invented part of speech in records corresponding to the class;
      
      (ii) P_non_—_invented(class|input) is a general probability that the words from the input string appear in records corresponding to the class;
      
      (iii) P_combined(class|input) is the corresponding class score, representing a probability that the input string corresponds to the class; and
      
      (iv) W_classis a weight reflecting confidence in the values of P_invented(class|input) and P_non_—_invented(class|input) for the class;
      
      (b) otherwise, if there is no class in the plurality of classes for which the probability that the input word group corresponding to the invented part of speech is used as the invented part of speech in records corresponding to that class is greater than zero then, for class in the subset of the plurality of classes, calculating the corresponding class score comprises calculating the class score based on P_non_—_invented(class|input), without considering whether the input word group is used as the invented part of speech in records corresponding to the class.
  - 12. The method of claim 9 wherein determining the match score for each record in the subset of records from the result set comprises combining a set of scoring data comprising:
    - (a) the relevant class score corresponding to the record;
      
      (b) the relevant part of speech score corresponding to the record; and
      
      (c) a baseline score corresponding to the record;
      
      wherein combining the set of scoring data comprises weighting the relevant part of speech score corresponding to the record by multiplying the relevant part of speech score corresponding to the record by a weighting factor.

13. A method comprising:
- (a) receiving an input string, the input string comprising a plurality of words comprising an input word group corresponding to an invented part of speech;
  
  (b) calculating via a computer a set of part of speech scores for the input string, wherein calculating the set of part of speech scores comprises, for each word group from a set of word groups from a dictionary list stored in advance on a computer readable medium, calculating a measure of similarity between;
  
  (i) the input word group corresponding to the invented part of speech; and
  
  (ii) the word group from the set of word groups from the dictionary list for which the part of speech score is being calculated;
  
  wherein the dictionary list comprises a plurality of word groups identified as corresponding to the invented part of speech;
  
  (c) determining a result set comprising a set of records retrieved from a database, wherein each record from the result set comprises an identifying word group corresponding to the invented part of speech; and
  
  (d) for each record in a subset of records from the result set, determining via the computer a match score based on a relevant part of speech score from the previously calculated set of part of speech scores, wherein the relevant part of speech score corresponds to the identifying word group from the record in the subset of records for which the match score is being determined; and
  
  wherein the cardinality of the subset of records is less than or equal to the cardinality of the set of records from the result set;
  
  wherein calculating the set of part of speech scores for the input string comprises, for each word group from the set of word groups from the dictionary list, calculating a cosine similarity measure between;
  
  (A) a first word group corresponding to the input word group corresponding to the invented part of speech; and
  
  (B) a second word group corresponding to the word group from the set of word groups from the dictionary list;
  
  wherein the cosine similarity measure is weighted by exponentially increasing the weight given to the similarity of words as those words approach the end of the first word group and the second word group.
- View Dependent Claims (14, 15)
- - 14. The method of claim 13, wherein calculating the set of part of speech scores for the input string comprises:
    - (a) defining the first word group by reordering the input word group by moving any grouping words from the input word group to the beginning of the input word group;
      
      (b) defining the second word group by reordering the word group from the set of word groups from the dictionary list by moving any grouping words from the word group from the set of word groups from the dictionary list to the beginning of the word group from the set of word groups from the dictionary list.
  - 15. The method of claim 13, wherein calculating the set of part of speech scores for the input string comprises:
    - (a) defining the first word group by deleting any grouping words from the input word group; and
      
      (b) defining the second word group by deleting any grouping words from the word group from the set of word groups from the dictionary list.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Connexity, Inc.
Original Assignee
Shopzilla Incorporated
Inventors
Roizen, Igor, Jawor, Wojciech, Dutton, Keith
Primary Examiner(s)
Vo, Huyen X.

Application Number

US12/839,764
Time in Patent Office

1,008 Days
Field of Search

704 1- 10, 704/231, 704/235, 704/251, 704/255, 704/257, 704/270, 704/277, 704/270.1
US Class Current

704/251
CPC Class Codes

G06F 16/3344 using natural language anal...

G06F 40/284 Lexical analysis, e.g. toke...

Usage based query response

First Claim

16 Assignments

0 Petitions

Accused Products

Abstract

37 Citations

15 Claims

Specification

Solutions

Use Cases

Quick Links

Usage based query response

First Claim

16 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

37 Citations

15 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links