DETERMINING UTILITY OF A QUESTION

US 20100049498A1
Filed: 08/25/2008
Published: 02/25/2010
Est. Priority Date: 08/25/2008
Status: Active Grant

First Claim

Patent Images

1. A method in a computing device for evaluating utility of a question, the method comprising:

providing a collection of questions, each question having one or more words;

calculating n-gram probabilities for the words within the questions of the collection; and

for each question in the collection, calculating a language model utility score of that question occurring in the collection based on the n-gram probabilities of words of that question following preceding n−

1 words, wherein the language model utility score is a measure of the utility of the question.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A question search system provides a collection of questions having words for use in evaluating the utility of the questions based on a language model. The question search system calculates n-gram probabilities for words within the questions of the collection. The n-gram probability of a word for a sequence of n−1 words indicates the probability of that word being next after that sequence in the collection of questions. The n-gram probabilities for the words of the collection represent the language model of the collection. The question search system calculates a language model utility score for each question within a collection that indicates the likelihood that a question is repeatedly asked by users. The question search system derives the language model utility score for a question from the n-gram probabilities of the words within that question.

216 Citations

21 Claims

1. A method in a computing device for evaluating utility of a question, the method comprising:
- providing a collection of questions, each question having one or more words;
  
  calculating n-gram probabilities for the words within the questions of the collection; and
  
  for each question in the collection, calculating a language model utility score of that question occurring in the collection based on the n-gram probabilities of words of that question following preceding n−
  
  1 words, wherein the language model utility score is a measure of the utility of the question.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1 wherein the calculating of the n-gram probabilities includes counting the number of each sequence of n−
    - 1 words in the questions of the collection and setting the n-gram probabilities of each word being preceded by each sequence.
  - 3. The method of claim 1 including:
    - identifying questions of the collection that match a queried question; and
      
      ranking the identified questions based on the utility of the questions.
  - 4. The method of claim 3 including calculating the relevance of each identified question to the queried question and wherein the ranking of the identified question is based on both relevance and utility of the identified questions.
  - 5. The method of claim 1 including, for each question in the collection, calculating a lexical centrality utility score of that question, wherein the lexical centrality utility score represents another measure of the utility of the question.
  - 6. The method of claim 5 wherein the lexical centrality utility scores are probabilities and are calculated by:
    - generating a question graph with nodes representing questions and links between adjacent nodes representing questions whose similarity satisfies a similarity threshold;
      
      establishing an initial lexical centrality utility score for the question of each node; and
      
      determining a stationary probability distribution for the lexical centrality utility scores for the nodes, wherein the lexical centrality utility score for a node is based on the lexical centrality utility score of that node and the lexical centrality utility scores of adjacent nodes.
  - 7. The method of claim 6 wherein the determining of a stationary probability distribution includes iteratively calculating the lexical centrality utility score for each node based on the lexical centrality utility scores of a previous iteration.
  - 8. The method of claim 6 wherein the initial lexical centrality utility scores are derived from initial language model utility scores.
  - 9. The method of claim 5 wherein the lexical centrality utility scores and the language model utility scores of the questions are combined to provide overall utility scores of the questions.
  - 10. The method of claim 1 wherein the language model utility score is used to calculate relevance of a question to a queried question during question retrieval.

11. A computing device for ranking questions that are relevant to queried questions, comprising:
- a collection store providing a collection of questions, each question having one or more words;
  
  a component that calculates n-gram probabilities for words following sequences of n−
  
  1 words within the questions of the collection;
  
  a component that calculates, for each question in the collection, a language model utility score of that question occurring in the collection based on the probabilities of the n-grams of that question, the language model utility score being calculated using a smoothing technique to account for data sparseness and a length normalization technique to account for differences in lengths of the question;
  
  a component that receives from a user a queried question;
  
  a component that identifies questions of the collection that are relevant to the queried question, each identified question having a relevance score;
  
  a component that, for each identified question, generates a combined score for the identified question based on the relevance score for that identified question and the language model utility score for that identified question; and
  
  a component that displays to the user an indication of identified questions with a ranking based on the combined scores of the identified questions.
- View Dependent Claims (12, 13, 14, 15, 16, 17)
- - 12. The computing device of claim 11 including a component that calculates, for each question in the collection, a lexical centrality utility score of that question, wherein the component that generates a combined score factors in the lexical centrality utility scores of the identified questions.
  - 13. The computing device of claim 12 wherein the lexical centrality utility scores are probabilities and the component that calculates the lexical centrality utility score:
    - generates a question graph with nodes representing questions and links between adjacent nodes representing questions whose similarity satisfies a similarity threshold;
      
      establishes an initial lexical centrality utility score for the question of each node; and
      
      determines a stationary probability distribution for the lexical centrality utility scores for the nodes, wherein the lexical centrality utility score for a node is based on the lexical centrality utility score of that node and the lexical centrality utility scores of adjacent nodes.
  - 14. The computing device of claim 13 wherein the initial lexical centrality utility score of a question is derived from the language model utility score for that question.
  - 15. The computing device of claim 11 wherein the language model utility score is derived from the following:
    - $p (Q) = p (q_{1}, q_{2}, \dots q_{m}) \approx \sum_{i = 1}^{m} p (q_{i} | q_{i - n + 1}^{i - 1})$ where p(Q) represents the probability of question Q, q_irepresents the ith word in question Q, q_i−
      
      n+1^i−
      
      1represents a sequence of n−
      
      1 words from word q_i−
      
      n+1to word q_i−
      
      1, and p(q_i|q_i-n+1^i−
      
      1) represents the conditional probability of word q_igiven the sequence of n−
      
      1 words q_i-n+1^i−
      
      1.
  - 16. The computing device of claim 11 wherein the combined score is derived from the following:
    - $p (Q | Q^{'}) \propto p (Q) = \prod_{w \in Q^{'}} p (w | Q)$ where p(Q|Q′
      
      ) represents the combined score for question Q and queried question Q′
      
      , p(Q) represents the language model utility score for question Q, and p(w|Q) represents the probability of word w of queried question Q′
      
      given question Q.
  - 17. The computing device of claim 11 wherein the component that generates a combined score uses a log-linear model to combine the language model utility score and the relevance score.

18. A computer-readable storage medium containing instructions for controlling a computing device to rank questions that are relevant to queried questions, by a method comprising:
- providing a collection of questions, each question having one or more words;
  
  for each question of the collection, calculating a utility score for the question, the utility score indicating a likelihood that the question is submitted;
  
  receiving a queried question;
  
  identifying questions of the collection that are relevant to the queried question;
  
  for each identified question, generating a ranking for the identified question based on the utility scores of the identified questions; and
  
  providing the identified questions with their ranking as a search result for the queried question.
- View Dependent Claims (19, 20, 21)
- - 19. The computer-readable storage medium of claim 18 wherein the calculating of a utility score includes calculating n-gram probabilities for words within the questions of the collection and calculating, for each question in the collection, a language model utility score of that question occurring in the collection based on the n-gram probabilities of words of that question.
  - 20. The computer-readable storage medium of claim 18 wherein the calculating of a utility score includes calculating, for each question in the collection, a lexical centrality utility score of that question.
  - 21. The computer-readable storage medium of claim 20 wherein the calculating of the lexical centrality utility scores of questions includes:
    - generating a question graph with nodes representing questions and links between adjacent nodes representing questions whose similarity satisfies a similarity threshold;
      
      establishing an initial lexical centrality utility score for the question of each node; and
      
      determining a stationary probability distribution for the lexical centrality utility scores for the nodes, wherein the lexical centrality utility score for a node is based on the lexical centrality utility score of that node and the lexical centrality utility scores of adjacent nodes.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Cao, Yunbo, Lin, Chin-Yew

Granted Patent

US 8,112,269 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/9
CPC Class Codes

G06F 16/3329 Natural language query form...

G06F 40/284 Lexical analysis, e.g. toke...

DETERMINING UTILITY OF A QUESTION

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

216 Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

DETERMINING UTILITY OF A QUESTION

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

216 Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links