Method for extracting from a text corpus answers to questions stated in natural language by using linguistic analysis and hypothesis generation

US 5,519,608 A
Filed: 06/24/1993
Issued: 05/21/1996
Est. Priority Date: 06/24/1993
Status: Expired due to Term

First Claim

Patent Images

1. A method of operating a processor-based computer system including at least one processor to organize information retrieval based on the content of a set of documents, the method comprising the steps of:

using a processor to accept an input string comprising an ordered set of words;

using a processor to accept the set of documents;

using a processor to analyze the content of at least one document of the set;

using a processor to generate automatically an answer hypothesis likely to be relevant to said input string, said answer hypothesis being generated responsively to said document content thus analyzed, said answer hypothesis comprising a phrase;

using a processor to verify the answer hypothesis by gathering evidence for a plurality of answer hypotheses; and

using a processor to determine a best answer hypothesis from among said plurality of answer hypotheses according to the evidence thus gathered.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A computerized method for organizing information retrieval based on the content of a set of primary documents. The method generates answer hypotheses based on text found in the primary documents and, typically, a natural-language input string such as a question. The answer hypotheses can include phrases or words not present in the input string. Answer hypotheses are verified and ranked based on their verification evidence. A text corpus can be queried to provide verification evidence not present in the primary documents. In another aspect the method is implemented in the context of a larger two-phase method, of which the first phase comprises the method of the invention and the second phase of the method comprises answer extraction.

Citations

52 Claims

1. A method of operating a processor-based computer system including at least one processor to organize information retrieval based on the content of a set of documents, the method comprising the steps of:
- using a processor to accept an input string comprising an ordered set of words;
  
  using a processor to accept the set of documents;
  
  using a processor to analyze the content of at least one document of the set;
  
  using a processor to generate automatically an answer hypothesis likely to be relevant to said input string, said answer hypothesis being generated responsively to said document content thus analyzed, said answer hypothesis comprising a phrase;
  
  using a processor to verify the answer hypothesis by gathering evidence for a plurality of answer hypotheses; and
  
  using a processor to determine a best answer hypothesis from among said plurality of answer hypotheses according to the evidence thus gathered.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27)
- - 2. The method of claim 1 in which the documents comprise natural-language text.
  - 3. The method of claim 1 wherein said input string comprises a user question, and wherein said phrase represents an answer to said user question.
  - 4. The method of claim 1 in which at least one word of the answer hypothesis is not present in the input string.
  - 5. The method of claim 1 in which the input string comprises natural-language text.
  - 6. The method of claim 5 additionally comprising the step of using a processor to analyze the input string.
  - 7. The method of claim 6 in which the step of using a processor to analyze the input string comprises using a processor to determine a linguistic relation implied by the input string.
  - 8. The method of claim 7 wherein the step of using a processor to verify the answer hypothesis comprises using a processor to take into account the linguistic relations thus determined.
  - 9. The method of claim 7 in which said phrase is generated by determining a match phrase, said match phrase being a phrase common to at least two documents of said set, said match phrase satisfying said linguistic relation implied by said input string.
  - 10. The method of claim 9 in which said match phrase is a noun phrase.
  - 11. The method of claim 9 in which said linguistic relation is selected from the group consisting of a lexical relation, a syntactic relation, a lexico-syntactic relation, a semantic relation, a grammatical relation, and a parsed relation.
  - 12. The method of claim 7 further comprising the step of:
    - using a processor in conjunction with an output device to output said answer hypothesis in the context of said linguistic relation.
  - 13. The method of claim 1 in which the step of using a processor to generate automatically an answer hypothesis comprises using a processor to generate the plurality of answer hypotheses from among which the best answer hypothesis is to be determined.
  - 14. The method of claim 1 in which a text corpus is used to provide verification evidence.
  - 15. The method of claim 14 in which the text corpus comprises the documents of the set of documents.
  - 16. The method of claim 14 in which the text corpus comprises documents not in the set of documents.
  - 17. The method of claim 14 in which the text corpus comprises the documents of the set and additional documents not in the set.
  - 18. The method of claim 1 in which the step of using a processor to generate automatically an answer hypothesis comprises using a processor to generate automatically at least two answer hypotheses and further comprising the step of using the processor to link at least two equivalent answer hypotheses.
  - 19. The method of claim 1 in which the step of using a processor to verify the answer hypothesis comprises using the processor to perform lexico-syntactic analysis automatically.
  - 20. The method of claim 19 which the lexico-syntactic analysis comprises lexico-syntactic pattern matching.
  - 21. The method of claim 20 in which the lexico-syntactic pattern matching comprises generating, instantiating, and matching templates.
  - 22. The method of claim 1 further comprising the step of using the processor to consult at least one reference document.
  - 23. The method of claim 1 in which said phrase contains a single word.
  - 24. The method of claim 1 in which said phrase contains a plurality of words.
  - 25. The method of claim 1 in which said phrase is present in said document content thus analyzed.
  - 26. The method of claim 1 in which said phrase is present in each of a plurality of documents of said set of documents.
  - 27. The method of claim 1 further comprising the step of:
    - using a processor in conjunction with an output device to output a set of results, said set of results being organized according to said answer hypothesis.

28. In the operation of a processor-based system comprising a processor, a memory coupled to the processor, a user interface, an answer extraction subsystem, a computerized information retrieval (IR) subsystem coupled to a text corpus, and channels connecting the answer extraction subsystem and the information retrieval subsystem, a method of operating the processor-based computer system to retrieve documents from the text corpus in response to a user-supplied natural language input string comprising words and a set of primary documents, the method comprising the steps of:
- using the user interface to accept the input string into the answer extraction subsystem;
  
  using the processor to analyze the input string to detect phrases therein;
  
  using the processor to accept the primary documents into the answer extraction subsystem;
  
  using the answer extraction subsystem to analyze the primary documents to detect additional phrases therein, the additional phrases not being present in the input string; and
  
  using the answer extraction subsystem to verify the additional phrases as answer hypotheses.
- View Dependent Claims (29, 30, 31, 32, 33, 34, 35, 36)
- - 29. The method of claim 28 in which the step of using the answer extraction subsystem to verify the additional phrases as answer hypotheses further comprises the step of using the answer extraction subsystem to generate secondary queries to retrieve secondary documents.
  - 30. The method of claim 29 in which the step of using the answer extraction subsystem to verify the additional phrases as answer hypotheses further comprises the step of analyzing the secondary documents to detect phrases therein.
  - 31. The method of claim 28 additionally comprising the step of using the answer extraction subsystem to score the additional phrases thus verified as answer hypotheses.
  - 32. The method of claim 31 additionally comprising the step of using the answer extraction subsystem to rank the additional phrases thus verified as answer hypotheses according to their scores.
  - 33. The method of claim 32 additionally comprising the step of organizing the additional phrases thus ranked for output.
  - 34. The method of claim 28 further comprising the step of using the answer extraction subsystem to output at least one of the additional phrases.
  - 35. The method of claim 28 in which the step of using the answer extraction subsystem to verify the additional phrases as answer hypotheses includes using the answer extraction subsystem to determine linguistic relations based on the phrases detected in the input string.
  - 36. The method of claim 35 in which the linguistic relations thus determined are used as a basis for verifying the additional phrases as answer hypotheses.

37. A method of operating a processor-based computer system for computerized information retrieval to respond to an input string, the method comprising the steps of:
- using a subsystem of said system to accept the input string;
  
  using a subsystem of said system to accept a set of primary documents;
  
  using a subsystem of said system to detect phrases in the primary documents;
  
  using a subsystem of said system to generate preliminary hypotheses based on phrases so detected;
  
  using a subsystem of said system to select preliminary hypotheses as answer hypotheses for verification, each of said answer hypotheses comprising a phrase detected in a document of said primary documents;
  
  using a subsystem of said system to determine linguistic relations implied by the input string;
  
  using a subsystem of said system to gather verification evidence for answer hypotheses; and
  
  using a subsystem of said system to rank answer hypotheses.
- View Dependent Claims (38, 39, 40, 41, 42, 43, 44, 45, 46)
- - 38. The method of claim 37 in which at least one of the phrases detected in the primary documents and selected as an answer hypothesis is a phrase not present in the input string.
  - 39. The method of claim 37 further comprising the step of using a subsystem of said system to link sets of equivalent answer hypotheses.
  - 40. The method of claim 37 in which the verification evidence comprises evidence that supports the linguistic relations implied by the input string with respect to at least one of the answer hypotheses.
  - 41. The method of claim 37 in which the step of using a subsystem of said system to rank hypotheses comprises the steps of:
    - using a subsystem of said system to assign scores to hypotheses; and
      
      using a subsystem of said system to order hypotheses according to the scores thus assigned.
  - 42. The method of claim 37 in which the step of using a subsystem of said system to rank answer hypotheses comprises using a subsystem of said system to take into account at least some of the verification evidence gathered for answer hypotheses.
  - 43. The method of claim 37 in which the step of using a subsystem of said system to rank hypotheses comprises using a subsystem of said system to take into account co-occurrence of hypotheses with input string phrases in primary documents.
  - 44. The method of claim 37 further comprising the step of using a subsystem of said system to generate preliminary scores for preliminary hypotheses and in which the step of using a subsystem of said system to rank hypotheses comprises using a subsystem of said system to take into account said preliminary scores.
  - 45. The method of claim 37 further comprising the step of using a subsystem of said system to generate at least one query.
  - 46. The method of claim 45 in which the evidence gathered during the step of using a subsystem of said system to gather verification evidence comprises the content of at least one document retrieved in response to the query.

47. A method of operating a computerized information retrieval system comprising the steps of:
- using a processor to accept an input string and a set of primary documents;
  
  using the processor to determine phrases in the input string;
  
  using the processor to generate hypotheses not present in the input string based on text in the primary documents, each of said hypotheses comprising a phrase;
  
  using the processor to verify the hypotheses using the primary documents by performing lexico-syntactic analysis automatically; and
  
  using the processor to score the hypotheses.
- View Dependent Claims (48, 49, 50)
- - 48. The method of claim 47 further comprising the steps of:
    - using the processor to construct and execute queries based on the hypotheses to retrieve a set of secondary documents; and
      
      using the processor to verify the hypotheses using the secondary documents thus retrieved.
  - 49. The method of claim 47 further comprising the step of using the processor to link equivalent hypotheses.
  - 50. The method of claim 47 further comprising the step of using the processor to output the scored hypotheses.

51. A method of operating a computer system for computerized information retrieval to process an input string supplied by a user, the method comprising the steps of:
- in a first phase, using the system to construct and execute a series of primary queries based on shallow linguistic analysis of the input string in order to retrieve primary documents; and
  
  in a second phase, using the system to perform answer extraction, the answer extraction comprising hypothesis generation and secondary query construction and execution, the hypothesis generation comprising the extraction of phrases from said primary documents, the secondary query construction and execution comprising the construction and execution of queries in order to retrieve secondary documents and to verify the phrases in a context provided by the secondary documents.

52. In the operation of a computer system comprising a processor, a memory coupled to the processor, a user interface, a primary query construction subsystem, an answer extraction subsystem, a computerized information retrieval (IR) subsystem coupled to a text corpus, and channels connecting the primary query construction subsystem and the information retrieval subsystem, a method of operating the computer system to retrieve documents from the text corpus in response to a user-supplied natural language input string comprising words, the method comprising the steps of:
- using the user interface to accept the input string into the primary query construction subsystem;
  
  using the primary query construction subsystem to analyze the input string to detect phrases therein;
  
  using the primary query construction subsystem to construct a series of queries based on the detected phrases, the queries of the series being constructed automatically by the primary query construction subsystem through a sequence of operations that comprises successive broadening and narrowing operations;
  
  using the primary query construction subsystem, the information retrieval subsystem, the text corpus, and the channels to execute the queries of the series;
  
  using the primary query construction subsystem to rank documents retrieved from the text corpus in response to one or more queries thus executed to produce a set of primary documents;
  
  using a channel to send the primary documents, the input string, and the phrases detected in the input string from the primary query construction subsystem to the answer extraction subsystem;
  
  using the answer extraction subsystem to generate hypotheses not present in the input string based on text in the primary documents;
  
  using the answer extraction subsystem to verify the hypotheses using the primary documents by performing lexico-syntactic analysis automatically; and
  
  using the answer extraction subsystem to score the hypotheses.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Xerox Corporation (Xerox Holdings Corp.)
Original Assignee
Xerox Corporation (Xerox Holdings Corp.)
Inventors
Kupiec, Julian M.
Primary Examiner(s)
Hayes, Gail O.
Assistant Examiner(s)
Tkacs, Stephen R.

Application Number

US08/082,938
Time in Patent Office

1,062 Days
Field of Search

364/419.08, 364/419.07, 364/419.13, 364/419.19, 395/12, 395/600
US Class Current

704/9
CPC Class Codes

G06F 16/3322   using system suggestions G0...

G06F 16/3329   Natural language query form...

G06F 16/3344   using natural language anal...

Y10S 707/99933   Query processing, i.e. sear...

Method for extracting from a text corpus answers to questions stated in natural language by using linguistic analysis and hypothesis generation

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

52 Claims

Specification

Solutions

Use Cases

Quick Links

Method for extracting from a text corpus answers to questions stated in natural language by using linguistic analysis and hypothesis generation

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

52 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links