Information retrieval utilizing semantic representation of text and based on constrained expansion of query words

US 6,246,977 B1
Filed: 08/03/1999
Issued: 06/12/2001
Est. Priority Date: 03/07/1997
Status: Expired due to Term

First Claim

Patent Images

1. A method in a computer system for processing a query directed to one or more target documents, both the query and the target documents comprising a series of words, the method comprising the steps of:

receiving the query;

determining the semantic roles of selected words in the query;

expanding the query by obtaining additional selected words which are similar to the selected words, wherein expanding the query is constrained to obtaining additional selected words having similar semantic roles to the selected words; and

identifying occurrences of the selected words and additional selected words in the target documents in which the selected words and additional selected words have the same semantic roles as in the expanded query.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention is directed to performing information retrieval utilizing semantic representation of text. In a preferred embodiment, a tokenizer generates from an input string information retrieval tokens that characterize the semantic relationship expressed in the input string. The tokenizer first creates from the input string a primary logical form characterizing a semantic relationship between selected words in the input string. The tokenizer then identifies hypemyms that each have an “is a” relationship with one of the selected words in the input string. The tokenizer then constructs from the primary logical form one or more alternative logical forms. The tokenizer constructs each alternative logical form by, for each of one or more of the selected words in the input string, replacing the selected word in the primary logical form with an identified hypernym of the selected word. Finally, the tokenizer generates tokens representing both the primary logical form and the alternative logical forms. The tokenizer is preferably used to generate tokens for both constructing an index representing target documents and processing a query against that index.

Citations

8 Claims

1. A method in a computer system for processing a query directed to one or more target documents, both the query and the target documents comprising a series of words, the method comprising the steps of:
- receiving the query;
  
  determining the semantic roles of selected words in the query;
  
  expanding the query by obtaining additional selected words which are similar to the selected words, wherein expanding the query is constrained to obtaining additional selected words having similar semantic roles to the selected words; and
  
  identifying occurrences of the selected words and additional selected words in the target documents in which the selected words and additional selected words have the same semantic roles as in the expanded query.
- View Dependent Claims (2, 3)
- - 2. The method of claim 1, further including the step of compiling from the target documents an index indicating, for a plurality of the words occurring in the target documents, the semantic role of the occurrence of the word in the target documents, and wherein the identifying step includes the step of comparing the selected words and additional selected words and their determined semantic roles to the compiled index.
  - 3. The method of claim 1 wherein the determining step determines which of the selected words is a principal verb of the query, which of the selected words is the deep subject of the principal verb, and which of the selected words is the deep object of the principal verb.

4. A computer memory containing a document indexing data structure characterizing the contents of one or more target documents, the documents indexing data structure mapping from words to locations in the target documents, the document indexing data structure mapping, for each of a plurality of passages of words occurring in the target documents, from words contained in a logical form generated from the passage to a location corresponding to the passage, and from hypernyms of words contained in the logical form generated from a constrained expansion of the passage to a location corresponding to the passage, wherein the constrained expansion constrains expansion of words in the passage to additional words which are similar in meaning to the words in the passage and have a similar semantic role, such that the document indexing data structure may be used to identify, in response to the receipt of a query, the location of passages of the target documents that are semantically similar to a passage of the query.
- View Dependent Claims (5)
- - 5. The computer memory of claim 4 wherein the document indexing data structure maps to a location in the target documents from at least one word not occurring in any of the target documents.

6. A computer system for responding to queries containing a query passage of words against one or more target documents, each target document comprised of one or more target document passages of words, each target document passage having a location within the target documents, the computer system comprising:
- a target document receiver for receiving the target documents;
  
  a query receiver for receiving queries against the target documents;
  
  a tokenizer for generating tokens from target document passages of target documents received by the target document receiver and of the query passage for queries received by the query receiver, the tokenizer including a logical form synthesizer for synthesizing from each passage a logical form characterizing the semantic structure of the passage, the tokenizer including an expander creating additional logical forms based on words in the logical forms synthesized from the target document passages or query, wherein the expander constrains creation of additional logical forms to include words having similar meaning to those in the passages and having a similar semantic role, the tokenizer generating tokens representing the logical forms synthesized from the passages and the additional logical forms;
  
  an index memory for storing a relation that maps from each token generated from a target document passage to the locations in the target documents of the target document passage from which the token was generated; and
  
  a query processing subsystem for, for each query, identifying in the index memory a token matching the token generated from the query and returning an indication of the location mapped to from the identified token.
- View Dependent Claims (7)
- - 7. The computer system of claim 6 wherein the logical forms synthesized by the logical form synthesizer contain words, and wherein the expander further includes:

8. A method in a computer system for processing a query directed to one or more target documents, both the query and the target documents comprising a series of words, the method comprising the steps of:
- receiving the query;
  
  determining the semantic roles of selected words in the query relative to one another; and
  
  identifying, as matching, only occurrences of the selected words, or similar words, in the target documents in which the selected words, or similar words, in the target document have the same semantic roles relative to one another as the selected words in the query.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Jensen, Karen, Messerly, John J., Dolan, William B., Richardson, Stephen D., Heidorn, George E.
Primary Examiner(s)
Thomas, Joseph

Application Number

US09/368,071
Time in Patent Office

679 Days
Field of Search

704/9, 704/10, 704/1, 704/8, 707/2, 707/3, 707/4, 707/5, 707/6, 707/100, 707/101, 707/102, 707/104, 707/530, 707/531, 707/532
US Class Current

704/9
CPC Class Codes

G06F 16/3344   using natural language anal...

G06F 40/211   Syntactic parsing, e.g. bas...

G06F 40/284   Lexical analysis, e.g. toke...

G06F 40/30   Semantic analysis

Y10S 707/99932   Access augmentation or opti...

Y10S 707/99935   Query augmenting and refini...

Information retrieval utilizing semantic representation of text and based on constrained expansion of query words

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

8 Claims

Specification

Solutions

Use Cases

Quick Links

Information retrieval utilizing semantic representation of text and based on constrained expansion of query words

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

8 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links