Cost-benefit approach to automatically composing answers to questions by extracting information from large unstructured corpora
First Claim
1. A computer implemented normalization system, comprising the following computer executable components:
- an interface component that receives data corresponding to a heterogeneous knowledge base;
a normalization component that applies a utility model that predicts accuracy or quality of results to provide a regularized understanding of the knowledge base; and
a preference component that enables users to assess or select various parameters that influence the utility model, the preference component processes at least one of a user setting for a cost, a value, and a language preference, the preference component includes a model where a user assesses a parameter v, indicating a dollar value of receiving a correct answer to a question, and where a parameter c represents a cost of each query rewrite submitted to a search engine.
2 Assignments
0 Petitions
Accused Products
Abstract
The present invention relates to a system and methodology to facilitate extraction of information from a large unstructured corpora such as from the World Wide Web and/or other unstructured sources. Information in the form of answers to questions can be automatically composed from such sources via probabilistic models and cost-benefit analyses to guide resource-intensive information-extraction procedures employed by a knowledge-based question answering system. The analyses can leverage predictions of the ultimate quality of answers generated by the system provided by Bayesian or other statistical models. Such predictions, when coupled with a utility model can provide the system with the ability to make decisions about the number of queries issued to a search engine (or engines), given the cost of queries and the expected value of query results in refining an ultimate answer. Given a preference model, information extraction actions can be taken with the highest expected utility. In this manner, the accuracy of answers to questions can be balanced with the cost of information extraction and analysis to compose the answers.
116 Citations
47 Claims
-
1. A computer implemented normalization system, comprising the following computer executable components:
-
an interface component that receives data corresponding to a heterogeneous knowledge base; a normalization component that applies a utility model that predicts accuracy or quality of results to provide a regularized understanding of the knowledge base; and a preference component that enables users to assess or select various parameters that influence the utility model, the preference component processes at least one of a user setting for a cost, a value, and a language preference, the preference component includes a model where a user assesses a parameter v, indicating a dollar value of receiving a correct answer to a question, and where a parameter c represents a cost of each query rewrite submitted to a search engine. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A computer implemented method to normalize a database, comprising the following computer executable acts:
-
automatically forming a set of queries from a question posed by a user; and performing a cost-benefit analysis on the set of queries to generate a query subset, the cost benefit analysis enables users to assess or select various parameters that influence a utility model, the utility model processes a user setting for a cost, the utility model allows a user to assess a parameter v, indicating a dollar value of receiving a correct answer to a question, and where a parameter c represents a cost of each query rewrite submitted to a search engine. - View Dependent Claims (22, 23, 24, 25)
-
-
26. A computer implemented system to facilitate database normalization, comprising:
-
computer implemented means for formulating a query set from a user question; computer implemented means for forming a query subset from the query set based at least in part on a utility model employed for normalizing the database; and computer implemented means for enabling users to assess or select various parameters that influence the utility model, the utility model processes a user setting for a cost, the utility model allows a user to assess a parameter v, indicating a dollar value of receiving a correct answer to a question, and where a parameter c represents a cost of each query rewrite submitted to a search engine.
-
-
27. A computer implemented question-answering system, comprising the following computer executable components:
-
a rewriting component that receives a user query and automatically formulates a set of queries; and a cost-benefit component to reduce the set of queries based upon an analysis of expected gains in accuracy of an answer in view of associated costs for additional queries, the cost-benefit component enables users to assess or select various parameters that influence a utility model, the utility model processes a user setting for a cost, the utility model allows a user to assess a parameter v, indicating a dollar value of receiving a correct answer to a question, and where a parameter c represents a cost of each query rewrite submitted to a search engine. - View Dependent Claims (28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43)
-
-
44. A computer implemented normalization system, comprising the following computer implemented components:
-
an interface component that receives data corresponding to a heterogeneous knowledge base; a normalization component that applies a model that predicts accuracy or quality of results in conjunction with a utility model to provide a regularized understanding of the value of performing different information extraction actions from the knowledge base; and
a preference component that enables users to assess or select various parameters that influence the utility model, the preference component processes at least one of a user setting for a cost, a value, and a language preference, the preference component includes a model where a user assesses a parameter v, indicating a dollar value of receiving a correct answer to a question, and where a parameter c represents a cost of each cjuery rewrite submitted to a search engine. - View Dependent Claims (45, 46, 47)
-
Specification