Method for computerized information retrieval using shallow linguistic analysis
First Claim
1. In a system comprising a processor, a memory coupled to the processor, a user interface coupled to the processor, a primary query construction subsystem executed by the processor, a computerized information retrieval (IR) subsystem coupled to a text corpus, and a channel coupling the primary query construction subsystem and the information retrieval subsystem with one another, a method for retrieving documents from the text corpus in response to a user-supplied natural language input string comprising words, the method comprising the steps of:
- with the user interface, accepting the input string into the primary query construction subsystem;
with the primary query construction subsystem, analyzing the input string by performing a linguistic analysis of the input string to detect phrases therein, the detected phrases comprising words, each of the detected phrases comprising a grammatical construct identified in the linguistic analysis, at least one of the grammatical constructs identified in the linguistic analysis being a noun phrase comprising a plurality of words including a head word;
with the primary query construction subsystem, constructing a series of queries based on the detected phrases, the queries of the series being constructed automatically by the primary query construction subsystem through a sequence of operations that comprises successive query broadening and query narrowing operations, each constructed query of the series comprising a collection of component queries, each component query being formed from a single one of the grammatical constructs identified in the linguistic analysis, each constructed query of the series having a first proximity constraint and a second proximity constraint, the first proximity constraint pertaining to a proximity relationship among words within a component query, the second proximity constraint pertaining to a proximity relationship among at least two component queries, at least one of the queries of the series comprising a component query based on all the words of the plurality;
with the primary query construction subsystem, automatically constructing an additional query based on the head word of the noun phrase without the other words of the plurality;
with the primary query construction subsystem, the information retrieval subsystem, the text corpus, and the channel, executing the queries of the series and the additional query to retrieve documents from the text corpus, the queries of the series being executed before the additional query; and
with the primary query construction subsystem, ranking documents retrieved from the text corpus in response to one or more queries thus executed.
3 Assignments
0 Petitions
Accused Products
Abstract
A computerized method for retrieving documents from a text corpus in response to a user-supplied natural language input string, e.g., a question. An input string is accepted and analyzed to detect phrases therein. A series of queries based on the detected phrases is automatically constructed through a sequence of successive broadening and narrowing operations designed to generate an optimal query or queries. The queries of the series are executed to retrieve documents, which are then ranked and made available for output to the user, a storage device, or further processing. In another aspect the method is implemented in the context of a larger two-phase method, of which the first phase comprises the method of the invention and the second phase of the method comprises answer extraction.
848 Citations
25 Claims
-
1. In a system comprising a processor, a memory coupled to the processor, a user interface coupled to the processor, a primary query construction subsystem executed by the processor, a computerized information retrieval (IR) subsystem coupled to a text corpus, and a channel coupling the primary query construction subsystem and the information retrieval subsystem with one another, a method for retrieving documents from the text corpus in response to a user-supplied natural language input string comprising words, the method comprising the steps of:
-
with the user interface, accepting the input string into the primary query construction subsystem; with the primary query construction subsystem, analyzing the input string by performing a linguistic analysis of the input string to detect phrases therein, the detected phrases comprising words, each of the detected phrases comprising a grammatical construct identified in the linguistic analysis, at least one of the grammatical constructs identified in the linguistic analysis being a noun phrase comprising a plurality of words including a head word; with the primary query construction subsystem, constructing a series of queries based on the detected phrases, the queries of the series being constructed automatically by the primary query construction subsystem through a sequence of operations that comprises successive query broadening and query narrowing operations, each constructed query of the series comprising a collection of component queries, each component query being formed from a single one of the grammatical constructs identified in the linguistic analysis, each constructed query of the series having a first proximity constraint and a second proximity constraint, the first proximity constraint pertaining to a proximity relationship among words within a component query, the second proximity constraint pertaining to a proximity relationship among at least two component queries, at least one of the queries of the series comprising a component query based on all the words of the plurality; with the primary query construction subsystem, automatically constructing an additional query based on the head word of the noun phrase without the other words of the plurality; with the primary query construction subsystem, the information retrieval subsystem, the text corpus, and the channel, executing the queries of the series and the additional query to retrieve documents from the text corpus, the queries of the series being executed before the additional query; and with the primary query construction subsystem, ranking documents retrieved from the text corpus in response to one or more queries thus executed. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
-
-
24. In a system comprising a processor, a memory coupled to the processor, a user interface coupled to the processor, a storage device coupled to the processor, a primary query construction subsystem executed by the processor, a computerized information retrieval (IR) subsystem coupled to a text corpus, and a channel connecting the primary query construction subsystem and the information retrieval subsystem with one another, a method for retrieving documents from the text corpus in response to a user-supplied natural language question comprising words, the method comprising the steps of:
-
with the user interface, accepting the question into the primary query construction subsystem; with the primary query construction subsystem, detecting phrases in the question using a text tagger, each of the detected phrases being a grammatical construct selected from the group a noun phrase, a title phrase, or a verb phrase, at least one of the detected phrases being a noun phrase comprising a plurality of words including a head word; with the primary query construction subsystem, constructing a set of initial queries based on one or more detected noun phrases, at least one of the initial queries including all the words of the plurality; with the primary query construction subsystem, the information retrieval subsystem, the text corpus, and the channel, processing the initial queries to obtain a number of matches for each initial query and to record the number of matches in a frequency table; with the primary query construction subsystem, constructing a set of title phrase queries based on any title phrases detected in the question; with the primary query construction subsystem, constructing component queries based on information in the frequency table; with the primary query construction subsystem, constructing additional component queries based on the set of title phrase queries; with the primary query construction subsystem, constructing a series of compound queries each comprising two or more component queries, each component query being formed from a single one of the grammatical constructs, the queries of the series being constructed automatically by the primary query construction subsystem with minimal user intervention through a sequence of operations, wherein; at least one compound query of the series is a broadened version of a particular compound query of the series, the broadened version being constructed by broadening the particular compound query through relaxation of at least one proximity constraint pertaining to a proximity relationship within at least one component query of the particular compound query; and at least one compound query of the series is a narrowed version of the broadened version, the narrowed version being constructed by tightening a proximity constraint pertaining to a proximity relationship among at least two of the component queries of the broadened version; with the primary query construction subsystem, constructing an additional query formed from the head word of the noun phrase without the other words of the plurality; with the primary query construction subsystem, the information retrieval subsystem, the text corpus, and the channel, processing the series of compound queries and the additional query in order to retrieve a plurality of documents from the text corpus, the queries of the series being processed before the additional query; and with the primary query construction subsystem, ranking documents retrieved from the text corpus in response to one or more queries thus processed; and with the primary query construction subsystem, outputting the documents thus ranked. - View Dependent Claims (25)
-
Specification