Method for extracting from a text corpus answers to questions stated in natural language by using linguistic analysis and hypothesis generation
First Claim
1. A method of operating a processor-based computer system including at least one processor to organize information retrieval based on the content of a set of documents, the method comprising the steps of:
- using a processor to accept an input string comprising an ordered set of words;
using a processor to accept the set of documents;
using a processor to analyze the content of at least one document of the set;
using a processor to generate automatically an answer hypothesis likely to be relevant to said input string, said answer hypothesis being generated responsively to said document content thus analyzed, said answer hypothesis comprising a phrase;
using a processor to verify the answer hypothesis by gathering evidence for a plurality of answer hypotheses; and
using a processor to determine a best answer hypothesis from among said plurality of answer hypotheses according to the evidence thus gathered.
4 Assignments
0 Petitions
Accused Products
Abstract
A computerized method for organizing information retrieval based on the content of a set of primary documents. The method generates answer hypotheses based on text found in the primary documents and, typically, a natural-language input string such as a question. The answer hypotheses can include phrases or words not present in the input string. Answer hypotheses are verified and ranked based on their verification evidence. A text corpus can be queried to provide verification evidence not present in the primary documents. In another aspect the method is implemented in the context of a larger two-phase method, of which the first phase comprises the method of the invention and the second phase of the method comprises answer extraction.
-
Citations
52 Claims
-
1. A method of operating a processor-based computer system including at least one processor to organize information retrieval based on the content of a set of documents, the method comprising the steps of:
-
using a processor to accept an input string comprising an ordered set of words; using a processor to accept the set of documents; using a processor to analyze the content of at least one document of the set; using a processor to generate automatically an answer hypothesis likely to be relevant to said input string, said answer hypothesis being generated responsively to said document content thus analyzed, said answer hypothesis comprising a phrase; using a processor to verify the answer hypothesis by gathering evidence for a plurality of answer hypotheses; and using a processor to determine a best answer hypothesis from among said plurality of answer hypotheses according to the evidence thus gathered. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27)
-
-
28. In the operation of a processor-based system comprising a processor, a memory coupled to the processor, a user interface, an answer extraction subsystem, a computerized information retrieval (IR) subsystem coupled to a text corpus, and channels connecting the answer extraction subsystem and the information retrieval subsystem, a method of operating the processor-based computer system to retrieve documents from the text corpus in response to a user-supplied natural language input string comprising words and a set of primary documents, the method comprising the steps of:
-
using the user interface to accept the input string into the answer extraction subsystem; using the processor to analyze the input string to detect phrases therein; using the processor to accept the primary documents into the answer extraction subsystem; using the answer extraction subsystem to analyze the primary documents to detect additional phrases therein, the additional phrases not being present in the input string; and using the answer extraction subsystem to verify the additional phrases as answer hypotheses. - View Dependent Claims (29, 30, 31, 32, 33, 34, 35, 36)
-
-
37. A method of operating a processor-based computer system for computerized information retrieval to respond to an input string, the method comprising the steps of:
-
using a subsystem of said system to accept the input string; using a subsystem of said system to accept a set of primary documents; using a subsystem of said system to detect phrases in the primary documents; using a subsystem of said system to generate preliminary hypotheses based on phrases so detected; using a subsystem of said system to select preliminary hypotheses as answer hypotheses for verification, each of said answer hypotheses comprising a phrase detected in a document of said primary documents; using a subsystem of said system to determine linguistic relations implied by the input string; using a subsystem of said system to gather verification evidence for answer hypotheses; and using a subsystem of said system to rank answer hypotheses. - View Dependent Claims (38, 39, 40, 41, 42, 43, 44, 45, 46)
-
-
47. A method of operating a computerized information retrieval system comprising the steps of:
-
using a processor to accept an input string and a set of primary documents; using the processor to determine phrases in the input string; using the processor to generate hypotheses not present in the input string based on text in the primary documents, each of said hypotheses comprising a phrase; using the processor to verify the hypotheses using the primary documents by performing lexico-syntactic analysis automatically; and using the processor to score the hypotheses. - View Dependent Claims (48, 49, 50)
-
-
51. A method of operating a computer system for computerized information retrieval to process an input string supplied by a user, the method comprising the steps of:
-
in a first phase, using the system to construct and execute a series of primary queries based on shallow linguistic analysis of the input string in order to retrieve primary documents; and in a second phase, using the system to perform answer extraction, the answer extraction comprising hypothesis generation and secondary query construction and execution, the hypothesis generation comprising the extraction of phrases from said primary documents, the secondary query construction and execution comprising the construction and execution of queries in order to retrieve secondary documents and to verify the phrases in a context provided by the secondary documents.
-
-
52. In the operation of a computer system comprising a processor, a memory coupled to the processor, a user interface, a primary query construction subsystem, an answer extraction subsystem, a computerized information retrieval (IR) subsystem coupled to a text corpus, and channels connecting the primary query construction subsystem and the information retrieval subsystem, a method of operating the computer system to retrieve documents from the text corpus in response to a user-supplied natural language input string comprising words, the method comprising the steps of:
-
using the user interface to accept the input string into the primary query construction subsystem; using the primary query construction subsystem to analyze the input string to detect phrases therein; using the primary query construction subsystem to construct a series of queries based on the detected phrases, the queries of the series being constructed automatically by the primary query construction subsystem through a sequence of operations that comprises successive broadening and narrowing operations; using the primary query construction subsystem, the information retrieval subsystem, the text corpus, and the channels to execute the queries of the series; using the primary query construction subsystem to rank documents retrieved from the text corpus in response to one or more queries thus executed to produce a set of primary documents; using a channel to send the primary documents, the input string, and the phrases detected in the input string from the primary query construction subsystem to the answer extraction subsystem; using the answer extraction subsystem to generate hypotheses not present in the input string based on text in the primary documents; using the answer extraction subsystem to verify the hypotheses using the primary documents by performing lexico-syntactic analysis automatically; and using the answer extraction subsystem to score the hypotheses.
-
Specification