Information retrieval utilizing semantic representation of text and based on constrained expansion of query words
First Claim
1. A method in a computer system for processing a query directed to one or more target documents, both the query and the target documents comprising a series of words, the method comprising the steps of:
- receiving the query;
determining the semantic roles of selected words in the query;
expanding the query by obtaining additional selected words which are similar to the selected words, wherein expanding the query is constrained to obtaining additional selected words having similar semantic roles to the selected words; and
identifying occurrences of the selected words and additional selected words in the target documents in which the selected words and additional selected words have the same semantic roles as in the expanded query.
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention is directed to performing information retrieval utilizing semantic representation of text. In a preferred embodiment, a tokenizer generates from an input string information retrieval tokens that characterize the semantic relationship expressed in the input string. The tokenizer first creates from the input string a primary logical form characterizing a semantic relationship between selected words in the input string. The tokenizer then identifies hypemyms that each have an “is a” relationship with one of the selected words in the input string. The tokenizer then constructs from the primary logical form one or more alternative logical forms. The tokenizer constructs each alternative logical form by, for each of one or more of the selected words in the input string, replacing the selected word in the primary logical form with an identified hypernym of the selected word. Finally, the tokenizer generates tokens representing both the primary logical form and the alternative logical forms. The tokenizer is preferably used to generate tokens for both constructing an index representing target documents and processing a query against that index.
-
Citations
8 Claims
-
1. A method in a computer system for processing a query directed to one or more target documents, both the query and the target documents comprising a series of words, the method comprising the steps of:
-
receiving the query;
determining the semantic roles of selected words in the query;
expanding the query by obtaining additional selected words which are similar to the selected words, wherein expanding the query is constrained to obtaining additional selected words having similar semantic roles to the selected words; and
identifying occurrences of the selected words and additional selected words in the target documents in which the selected words and additional selected words have the same semantic roles as in the expanded query. - View Dependent Claims (2, 3)
-
- 4. A computer memory containing a document indexing data structure characterizing the contents of one or more target documents, the documents indexing data structure mapping from words to locations in the target documents, the document indexing data structure mapping, for each of a plurality of passages of words occurring in the target documents, from words contained in a logical form generated from the passage to a location corresponding to the passage, and from hypernyms of words contained in the logical form generated from a constrained expansion of the passage to a location corresponding to the passage, wherein the constrained expansion constrains expansion of words in the passage to additional words which are similar in meaning to the words in the passage and have a similar semantic role, such that the document indexing data structure may be used to identify, in response to the receipt of a query, the location of passages of the target documents that are semantically similar to a passage of the query.
-
6. A computer system for responding to queries containing a query passage of words against one or more target documents, each target document comprised of one or more target document passages of words, each target document passage having a location within the target documents, the computer system comprising:
-
a target document receiver for receiving the target documents;
a query receiver for receiving queries against the target documents;
a tokenizer for generating tokens from target document passages of target documents received by the target document receiver and of the query passage for queries received by the query receiver, the tokenizer including a logical form synthesizer for synthesizing from each passage a logical form characterizing the semantic structure of the passage, the tokenizer including an expander creating additional logical forms based on words in the logical forms synthesized from the target document passages or query, wherein the expander constrains creation of additional logical forms to include words having similar meaning to those in the passages and having a similar semantic role, the tokenizer generating tokens representing the logical forms synthesized from the passages and the additional logical forms;
an index memory for storing a relation that maps from each token generated from a target document passage to the locations in the target documents of the target document passage from which the token was generated; and
a query processing subsystem for, for each query, identifying in the index memory a token matching the token generated from the query and returning an indication of the location mapped to from the identified token. - View Dependent Claims (7)
a hypernym expansion subsystem for creating from each logical form synthesized by the logical form synthesizer one or more of the additional logical forms in which one or more of the words of the logical form are replaced with hypernyms, the tokenizer also generating tokens representing the additional logical forms created by the hypernym expansion subsystem.
-
-
8. A method in a computer system for processing a query directed to one or more target documents, both the query and the target documents comprising a series of words, the method comprising the steps of:
-
receiving the query;
determining the semantic roles of selected words in the query relative to one another; and
identifying, as matching, only occurrences of the selected words, or similar words, in the target documents in which the selected words, or similar words, in the target document have the same semantic roles relative to one another as the selected words in the query.
-
Specification