Information retrieval utilizing semantic representation of text by identifying hypernyms and indexing multiple tokenized semantic structures to a same passage of text
First Claim
1. A method in a computer system for generating information retrieval tokens from an input string, the method comprising the steps of:
- creating from the input string a primary logical form characterizing a semantic relationship between selected words in the input string;
identifying hypernyms of the selected words in the input string, at least one hypernym being identified from a group of hypernyms associated with a selected word wherein at least one other hypernym in the group is not identified;
constructing from the primary logical form one or more alternative logical forms, each alternative logical form being constructed by, for each of one or more of the selected words in the input string, replacing the selected word in the primary logical form with an identified hypernym of the selected word; and
generating tokens representing both the primary logical form and the alternative logical forms, the generated tokens being distinguishable by an information retrieval engine.
2 Assignments
0 Petitions
Accused Products
Abstract
The present invention is directed to performing information retrieval utilizing semantic representation of text. In a preferred embodiment, a tokenizer generates from an input string information retrieval tokens that characterize the semantic relationship expressed in the input string. The tokenizer first creates from the input string a primary logical form characterizing a semantic relationship between selected words in the input string. The tokenizer then identifies hypernyms that each have an "is a" relationship with one of the selected words in the input string. The tokenizer then constructs from the primary logical form one or more alternative logical forms. The tokenizer constructs each alternative logical form by, for each of one or more of the selected words in the input string, replacing the selected word in the primary logical form with an identified hypernym of the selected word. Finally, the tokenizer generates tokens representing both the primary logical form and the alternative logical forms. The tokenizer is preferably used to generate tokens for both constructing an index representing target documents and processing a query against that index.
293 Citations
18 Claims
-
1. A method in a computer system for generating information retrieval tokens from an input string, the method comprising the steps of:
-
creating from the input string a primary logical form characterizing a semantic relationship between selected words in the input string; identifying hypernyms of the selected words in the input string, at least one hypernym being identified from a group of hypernyms associated with a selected word wherein at least one other hypernym in the group is not identified; constructing from the primary logical form one or more alternative logical forms, each alternative logical form being constructed by, for each of one or more of the selected words in the input string, replacing the selected word in the primary logical form with an identified hypernym of the selected word; and generating tokens representing both the primary logical form and the alternative logical forms, the generated tokens being distinguishable by an information retrieval engine. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer-readable medium whose contents cause a computer system to generate information retrieval tokens from an input string by performing the steps of:
-
creating from the input string a primary logical form characterizing a semantic relationship between selected words in the input string; identifying hypernyms of the selected words in the input string, at least one hypernym being identified from a group of hypernyms associated with a selected word wherein at least one other hypernym in the group is not identified; constructing from the primary logical form one or more alternative logical forms, each alternative logical form being constructed by, for each of one or more of the selected words in the input string, replacing the selected word in the primary logical form with an identified hypernym of the selected word; and generating tokens representing both the primary logical form and the alternative logical forms, the generated tokens being distinguishable by an information retrieval engine. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A method in a computer system for creating and accessing an information retrieval index, the index characterizing one or more indexed documents by storing tokenized semantic structures occurring in the indexed documents, the method comprising the steps of:
-
for each of a plurality of passages in the indexed documents; receiving a plurality of tokenized semantic structures all corresponding to the same passage, receiving an indexed document location for the passage, and storing in the index mappings from each of the plurality of received tokenized semantic structures to the index document location; and for a query issued against the indexed documents; receiving a tokenized semantic structure corresponding to the query, identifying in the index a mapping from a tokenized semantic structure matching the tokenized semantic structure corresponding to the query to an identified indexed document location, and returning the identified indexed document location. - View Dependent Claims (18)
-
Specification