COST-BENEFIT APPROACH TO AUTOMATICALLY COMPOSING ANSWERS TO QUESTIONS BY EXTRACTING INFORMATION FROM LARGE UNSTRUCTURED CORPORA

US 20060294037A1
Filed: 08/31/2006
Published: 12/28/2006
Est. Priority Date: 08/06/2003
Status: Active Grant

First Claim

Patent Images

1. A normalization system, comprising:

an interface component that processes questions posed by users corresponding to a heterogeneous knowledge base;

a dialog component that requests users to reformulate questions; and

a normalization component that applies a utility model that predicts accuracy or quality of results to provide a regularized understanding of the knowledge base.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention relates to a system and methodology to facilitate extraction of information from a large unstructured corpora such as from the World Wide Web and/or other unstructured sources. Information in the form of answers to questions can be automatically composed from such sources via probabilistic models and cost-benefit analyses to guide resource-intensive information-extraction procedures employed by a knowledge-based question answering system. The analyses can leverage predictions of the ultimate quality of answers generated by the system provided by Bayesian or other statistical models. Such predictions, when coupled with a utility model can provide the system with the ability to make decisions about the number of queries issued to a search engine (or engines), given the cost of queries and the expected value of query results in refining an ultimate answer. Given a preference model, information extraction actions can be taken with the highest expected utility. In this manner, the accuracy of answers to questions can be balanced with the cost of information extraction and analysis to compose the answers.

144 Citations

View as Search Results

20 Claims

1. A normalization system, comprising:
- an interface component that processes questions posed by users corresponding to a heterogeneous knowledge base;
  
  a dialog component that requests users to reformulate questions; and
  
  a normalization component that applies a utility model that predicts accuracy or quality of results to provide a regularized understanding of the knowledge base.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The system of claim 1, the utility model dynamically controls extraction of previously unknown or disassociated information from the knowledge base.
  - 3. The system of claim 1, the utility model controls a number of queries submitted to the knowledge base given decision-theoretic considerations.
  - 4. The system of claim 1, the knowledge base includes at least one a local database, a file, a directory, an electronic encyclopedia, a dictionary, a remote database, and a remote web site.
  - 5. The system of claim 1, the utility model applies a cost-benefit analysis to dynamically control the number and types of attempts made to acquire information or answers from the knowledge base in response to a question or questions.
  - 6. The system of claim 5, the utility model includes an analysis of the costs of searching for information versus the benefits or value of obtaining more accurate answers to questions.
  - 7. The system of claim 1, the dialog component initiates a dialog with users based upon predetermined probability thresholds or other criteria that includes a cost-benefit analyses that considers when it would be best to ask a user to reformulate a question rather than expending effort on processing a query that may be expensive in terms of searching for information from the knowledge base or likely to yield inaccurate results.
  - 8. The system of claim 7, the dialog is initiated from an assessment of a cost of delay and effort associated with a query reformulation and a likelihood that a reformulation would lead to an improved result.
  - 9. The system of claim 1, further comprising a preference component that enables users to assess or select various parameters that influence the utility model.
  - 10. The system of claim 9, the preference component processes at least one of a user setting for a cost, a value, and a language preference.
  - 11. The system of claim 10, the preference component includes a model where a user assesses a parameter v, indicating a dollar value of receiving a correct answer to a question, and where a parameter c represents a cost of each query rewrite submitted to a search engine.
  - 12. The system of claim 11, further comprising a value of receiving an answer expressed as a function of details of a current context, the value of the answer is linked to at least one of a type of question, an informational goal, and a time of day for a user.
  - 13. The system of claim 11, further comprising determining a cost of submitting queries as a function of at least one of a current load sensed on a search engine or the numbers of queries being submitted by a user'"'"'s entire organization to a third-party search service.

14. The system of claim 135, further comprising determining the costs non-linearly with increasing numbers of queries.

15. A method to normalize a database, comprising:
- automatically forming a set of queries from a question posed by a user, each query is assigned a weight; and
  
  performing a cost-benefit analysis on the set of queries to generate a query subset.
- View Dependent Claims (16, 17, 18, 19)
- - 16. The method of claim 15, further comprising automatically ranking the set of queries in an order of likelihood of providing a suitable answer.
  - 17. The method of claim 15, further comprising automatically training at least one model to generate the query subset.
  - 18. The method of claim 15, further comprising submitting the query subset to at least one search engine.
  - 19. The method of claim 18, further comprising receiving results from the at least one search engine and automatically composing an answer.

20. A system to facilitate database normalization, comprising:
- means for formulating a query set from a user question;
  
  means for assigning a weight to each query; and
  
  means for forming a query subset from the query set based at least in part on a utility model employed for normalizing the database.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Dumais, Susan T., Azari, David R., Brill, Eric D., Horvitz, Eric J.

Granted Patent

US 7,516,113 B2
Time in Patent Office

Days
Field of Search
US Class Current

706/46
CPC Class Codes

G06F 16/3344   using natural language anal...

G06F 16/3346   using probabilistic model

Y10S 707/99933   Query processing, i.e. sear...

COST-BENEFIT APPROACH TO AUTOMATICALLY COMPOSING ANSWERS TO QUESTIONS BY EXTRACTING INFORMATION FROM LARGE UNSTRUCTURED CORPORA

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

144 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

COST-BENEFIT APPROACH TO AUTOMATICALLY COMPOSING ANSWERS TO QUESTIONS BY EXTRACTING INFORMATION FROM LARGE UNSTRUCTURED CORPORA

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

144 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others