Information retrieval utilizing semantic representation of text
First Claim
1. A method in a computer system for identifying passages of a first body of text relating to a passage of a second body of text, the method comprising the steps of:
- for each of a multiplicity of passages of the first body of text each having a location in the first body of text;
constructing a first logical form characterizing a semantic relationship between selected words in the passage,expanding the constructed first logical form to include alternative words for at least some of the selected words in the passage, andstoring in an index a mapping from the expanded first logical form to the location of the passage in the first body of text;
constructing a second logical form characterizing a semantic relationship between selected words in the passage of the second body of text;
expanding the constructed second logical form characterizing a semantic relationship between selected words in the passage of the second body of text to include alternative words for at least some of the selected words in the passage; and
comparing the expanded second logical form characterizing a semantic relationship between selected words in the passage of the second body of text to the expanded first logical forms from which the index maps to identify a passage of the first body of text whose expanded logical form intersects with the expanded logical form characterizing a semantic relationship between selected words in the passage of the second body of text, in that, for pair of corresponding selected words between the intersecting expanded logical forms, the selected word or one of its alternative words in the expanded first logical form matches the selected word or one of its alternative words in the expanded second logical form, such that a passage of the first body of text relating to the passage of the second body of text is identified.
2 Assignments
0 Petitions
Accused Products
Abstract
The present invention is directed to performing information retrieval utilizing semantic representation of text. In a preferred embodiment, a tokenizer generates from an input string information retrieval tokens that characterize the semantic relationship expressed in the input string. The tokenizer first creates from the input string a primary logical form characterizing a semantic relationship between selected words in the input string. The tokenizer then identifies hypernyms that each have an "is a" relationship with one of the selected words in the input string. The tokenizer then constructs from the primary logical form one or more alternative logical forms. The tokenizer constructs each alternative logical form by, for each of one or more of the selected words in the input string, replacing the selected word in the primary logical form with an identified hypernym of the selected word. Finally, the tokenizer generates tokens representing both the primary logical form and the alternative logical forms. The tokenizer is preferably used to generate tokens for both constructing an index representing target documents and processing a query against that index.
487 Citations
23 Claims
-
1. A method in a computer system for identifying passages of a first body of text relating to a passage of a second body of text, the method comprising the steps of:
-
for each of a multiplicity of passages of the first body of text each having a location in the first body of text; constructing a first logical form characterizing a semantic relationship between selected words in the passage, expanding the constructed first logical form to include alternative words for at least some of the selected words in the passage, and storing in an index a mapping from the expanded first logical form to the location of the passage in the first body of text; constructing a second logical form characterizing a semantic relationship between selected words in the passage of the second body of text; expanding the constructed second logical form characterizing a semantic relationship between selected words in the passage of the second body of text to include alternative words for at least some of the selected words in the passage; and comparing the expanded second logical form characterizing a semantic relationship between selected words in the passage of the second body of text to the expanded first logical forms from which the index maps to identify a passage of the first body of text whose expanded logical form intersects with the expanded logical form characterizing a semantic relationship between selected words in the passage of the second body of text, in that, for pair of corresponding selected words between the intersecting expanded logical forms, the selected word or one of its alternative words in the expanded first logical form matches the selected word or one of its alternative words in the expanded second logical form, such that a passage of the first body of text relating to the passage of the second body of text is identified. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A computer-readable medium whose contents cause a computer system to identify passages of a first body of text relating to a passage of a second body of text by performing the steps of:
-
for each of a multiplicity of passages of the first body of text each having a location in the first body of text; constructing a first logical form characterizing a semantic relationship between selected words in the passage, expanding the constructed first logical form to include alternative words for at least some of the selected words in the passage, and storing in an index a mapping from the expanded first logical form to the location of the passage in the first body of text; constructing a second logical form characterizing a semantic relationship between selected words in the passage of the second body of text; expanding the second logical form characterizing a semantic relationship between selected words in the passage of the second body of text to include alternative words for at least some of the selected words in the passage; and comparing the expanded second logical form characterizing a semantic relationship between selected words in the passage of the second body of text to the expanded first logical forms from which the index maps to identify a passage of the first body of text whose expanded logical form intersects with the expanded logical form characterizing a semantic relationship between selected words in the passage of the second body of text, in that, for pair of corresponding selected words between the intersecting expanded logical forms, the selected word or one of its alternative words in the expanded first logical form matches the selected word or one of its alternative words in the expanded second logical form, such that a passage of the first body of text relating to the passage of the second body of text is identified. - View Dependent Claims (18, 19, 20, 21)
-
-
22. A computer system adapted to identify passages of a first body of text relating to a passage of a second body of text, comprising:
-
an indexing component adapted to process each of a multiplicity of passages of the first body of text each having a location in the first body of text by; constructing a first logical form characterizing a semantic relationship between selected words in the passage, expanding the constructed first logical form to include alternative words for at least some of the selected words in the passage, and storing in an index a mapping from the expanded first logical form to the location of the passage in the first body of text; a semantic relationship characterization component adapted to characterize a semantic relationship between selected words in the passage of the second body of text by; constructing a second logical form characterizing a semantic relationship between selected words in the passage of the second body of text, and expanding the second logical form characterizing a semantic relationship between selected words in the passage of the second body of text to include alternative words for at least some of the selected words in the passage; and a related passage identification component adapted to compare the expanded second logical form characterizing a semantic relationship between selected words in the passage of the second body of text to the expanded first logical forms from which the index maps to identify a passage of the first body of text whose expanded logical form intersects with the expanded logical form characterizing a semantic relationship between selected words in the passage of the second body of text, in that, for pair of corresponding selected words between the intersecting expanded logical forms, the selected word or one of its alternative words in the expanded first logical form matches the selected word or one of its alternative words in the expanded second logical form, such that a passage of the first body of text relating to the passage of the second body of text is identified. - View Dependent Claims (23)
-
Specification