COST-BENEFIT APPROACH TO AUTOMATICALLY COMPOSING ANSWERS TO QUESTIONS BY EXTRACTING INFORMATION FROM LARGE UNSTRUCTURED CORPORA
First Claim
1. A normalization system, comprising:
- an interface component that processes questions posed by users corresponding to a heterogeneous knowledge base;
a dialog component that requests users to reformulate questions; and
a normalization component that applies a utility model that predicts accuracy or quality of results to provide a regularized understanding of the knowledge base.
2 Assignments
0 Petitions
Accused Products
Abstract
The present invention relates to a system and methodology to facilitate extraction of information from a large unstructured corpora such as from the World Wide Web and/or other unstructured sources. Information in the form of answers to questions can be automatically composed from such sources via probabilistic models and cost-benefit analyses to guide resource-intensive information-extraction procedures employed by a knowledge-based question answering system. The analyses can leverage predictions of the ultimate quality of answers generated by the system provided by Bayesian or other statistical models. Such predictions, when coupled with a utility model can provide the system with the ability to make decisions about the number of queries issued to a search engine (or engines), given the cost of queries and the expected value of query results in refining an ultimate answer. Given a preference model, information extraction actions can be taken with the highest expected utility. In this manner, the accuracy of answers to questions can be balanced with the cost of information extraction and analysis to compose the answers.
144 Citations
20 Claims
-
1. A normalization system, comprising:
-
an interface component that processes questions posed by users corresponding to a heterogeneous knowledge base;
a dialog component that requests users to reformulate questions; and
a normalization component that applies a utility model that predicts accuracy or quality of results to provide a regularized understanding of the knowledge base. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. The system of claim 135, further comprising determining the costs non-linearly with increasing numbers of queries.
-
15. A method to normalize a database, comprising:
-
automatically forming a set of queries from a question posed by a user, each query is assigned a weight; and
performing a cost-benefit analysis on the set of queries to generate a query subset. - View Dependent Claims (16, 17, 18, 19)
-
-
20. A system to facilitate database normalization, comprising:
-
means for formulating a query set from a user question;
means for assigning a weight to each query; and
means for forming a query subset from the query set based at least in part on a utility model employed for normalizing the database.
-
Specification