Method and apparatus for generating query responses in a computer-based document retrieval system

US 5,724,571 A
Filed: 07/07/1995
Issued: 03/03/1998
Est. Priority Date: 07/07/1995
Status: Expired due to Term

First Claim

Patent Images

1. A method for locating information in documents in a database stored in a memory coupled to a processor, the method being carried out by program steps executed by said processor, including the steps of:

(1) receiving a search query including at least one query term;

(2) generating at least one hit passage from said documents, said hit passage including at least one hit term corresponding to said at least one query term;

(3) for at least a first hit term and a second hit term corresponding, respectively, to at least a first query term and a second query term, determining a first distance between said first and second hit terms and a second distance between said first and second query terms;

(4) generating a factor having a magnitude based upon a comparison of said first distance with said second distance; and

(5) generating a score for said hit passage incorporating the magnitude of said factor;

wherein said hit passage has a size based upon a size of said search query.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention relates to a method and apparatus for generating responses to queries to a document retrieval system. The system responds to a specific request for information by locating and ranking portions of text that may contain the information sought. It locates small relevant passages of text (called "hit passages") and ranks them according to an estimate of the degree to which they correspond to the information sought. The system minimizes the number of these hit passages that need to be examined before an information seeker has either found the desired information or can safely conclude that the information sought is not in the collection of texts. A relaxation ranking mechanism is provided to accommodate paraphrase variations that occur between the description of the information sought and the content of the text passages that may constitute suitable answers, by retrieving phrases that are dissimilar to the query phrase to different degrees according to a predefined set of rules, and penalizing the retrieved phrases based upon the degree of this dissimilarity, thus providing the user with a priority organized query hit list.

414 Citations

13 Claims

1. A method for locating information in documents in a database stored in a memory coupled to a processor, the method being carried out by program steps executed by said processor, including the steps of:
- (1) receiving a search query including at least one query term;
  
  (2) generating at least one hit passage from said documents, said hit passage including at least one hit term corresponding to said at least one query term;
  
  (3) for at least a first hit term and a second hit term corresponding, respectively, to at least a first query term and a second query term, determining a first distance between said first and second hit terms and a second distance between said first and second query terms;
  
  (4) generating a factor having a magnitude based upon a comparison of said first distance with said second distance; and
  
  (5) generating a score for said hit passage incorporating the magnitude of said factor;
  
  wherein said hit passage has a size based upon a size of said search query.
- View Dependent Claims (5, 6, 7, 8)
- - 5. The method of any one of claims 1-4, wherein:
    - step 2 includes the step of generating a plurality of hit passages;
      
      step 3 is carried out for at least two said hit terms in each of said plurality of hit passages;
      
      step 4 is carried out for each of said distances determined in step 3 for each set of corresponding hit terms and query terms; and
      
      step 5 is carried out for said plurality of hit passages.
  - 6. The method of any one of claims 1-4, further including the steps of:
    - after step 5, determining a best-scored hit passage; and
      
      retrieving at least said best-scored hit passage.
  - 7. The method of any one of claims 1-4;
    - further including the steps of;
      
      after step 5, determining a best-scored hit passage; and
      
      retrieving at least a document containing said best-scored hit passage.
  - 8. The method of any one of claims 1-4, wherein said score is generated at least in part based upon a factor proportional to said first distance.

2. A method for locating information in documents in a database stored in a memory coupled to a processor, the method being carried out by program steps executed by said processor, including the steps of:
- (1) receiving a search query including at least one query term;
  
  (2) generating at least one hit passage from said documents, said hit passage including at least one hit term corresponding to said at least one query term;
  
  (3) for at least a first hit term and a second hit term corresponding, respectively, to at least a first query term and a second query term, determining a first distance between said first and second hit terms and a second distance between said first and second query terms,(4) generating a factor having a magnitude based upon a comparison of said first distance with said second distance; and
  
  (5) generating a score for said hit passage incorporating the magnitude of said factor;
  
  wherein said score is additionally based upon a penalty generated from a measure of a semantic similarity between at least one query term and at least one hit term.

3. A method for locating information in documents in a database stored in a memory coupled to a processor, the method being carried out by program steps executed by said processor, including the steps of:
- (1) receiving a search query including at least one query term;
  
  (2) generating at least one hit passage from said documents, said hit passage including at least one hit term corresponding to said at least one query term;
  
  (3) for at least a first hit term and a second hit term corresponding, respectively, to at least a first query term and a second query term, determining a first distance between said first and second hit terms and a second distance between said first and second query terms;
  
  (4) generating a factor having a magnitude based upon a comparison of said first distance with said second distance; and
  
  (5) generating a score for said hit passage incorporating the magnitude of said factor;
  
  wherein said score is additionally based upon a penalty generated from a comparison of the total number of query terms with the total number of hit terms.

4. A method for locating information in documents in a database stored in a memory coupled to a processor, the method being carried out by program steps executed by said processor, including the steps of:
- (1) receiving a search query including at least one query term;
  
  (2) generating at least one hit passage from said documents, said hit passage including at least one hit term corresponding to said at least one query term;
  
  (3) for at least a first hit term and a second hit term corresponding, respectively, to at least a first query term and a second query term, determining a first distance between said first and second hit terms and a second distance between said first and second query terms;
  
  (4) generating a factor having a magnitude based upon a comparison of said first distance with said second distance; and
  
  (5) generating a score for said hit passage incorporating the magnitude of said factor;
  
  further including the step of providing at least one hyperlink in said retrieved passage, said hyperlink linked to the document containing said passage.

9. A method for locating information in documents in a database stored in a memory coupled to a processor, the method being carried out by program steps executed by said processor, including the steps of:
- (1) receiving a search query including at least a first query term and a second query term in a first order;
  
  (2) generating at least one hit passage from said documents, said hit passage including at least a first hit term corresponding to said first query term and a second hit term corresponding to said second query term, said first and second hit terms being in a second order;
  
  (3) generating a factor having a magnitude based upon a comparison of said first order with said second order; and
  
  (4) generating a score for said hit passage incorporating the magnitude of said factor;
  
  wherein said score is additionally based upon a penalty generated from a measure of a semantic similarity between at least one query term and at least one hit term.

10. A method for locating information in documents in a database stored in a memory coupled to a processor, the method being carried out by program steps executed by said processor, including the steps of:
- (1) receiving a search query including at least a first query term and a second query term in a first order;
  
  (2) generating at least one hit passage from said documents, said hit passage including at least a first hit term corresponding to said first query term and a second hit term corresponding to said second query term, said first and second hit terms being in a second order;
  
  (3) generating a factor having a magnitude based upon a comparison of said first order with said second order; and
  
  (4) generating a score for said hit passage incorporating the magnitude of said factor;
  
  wherein said score is additionally based upon a penalty generated from a comparison of the total number of query terms with the total number of hit terms.

11. A method for locating information in documents in a database stored in a memory coupled to a processor of a computer system, the computer system further including a proximity buffer and an output buffer coupled to said processor, the method being carried out by program steps executed by said processor and including the steps of:
- (1) receiving a search query including at least one query term;
  
  (2) determining at least one target region of at least one said document in said database;
  
  (3) setting a penalty threshold to a predefined maximum;
  
  (4) determining query hits corresponding to said at least one query term within said target region and correlating with each said query hit a score reflecting how closely it corresponds to its corresponding query term;
  
  (5) storing said query hits in said proximity buffer;
  
  (6) designating a best-scoring query hit from said proximity buffer as a current query hit;
  
  (7) if said output buffer is full, discarding a lowest-scored query hit;
  
  (8) inserting said current query hit into said output buffer;
  
  (9) if the output buffer is now full, setting said penalty threshold to the score of a lowest-scored query hit in the output buffer;
  
  (10) if a predetermined criterion is met, then proceeding to step 13 and otherwise proceeding to step 11;
  
  (11) if there are more entailing term hits to generate, then proceeding to step 12 and otherwise proceeding to step 13;
  
  (12) repositioning the target region relative to said document, and proceeding to step 4; and
  
  (13) returning the contents of the output buffer.
- View Dependent Claims (12, 13)
- - 12. The method of claim 11, wherein the predetermined criterion of step 10 is whether the last lowest-scored query hit in the output buffer has zero penalty.
  - 13. The method of claim 11, wherein the predetermined criterion of step 10 is whether all documents have been searched.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Oracle America, Inc. (Oracle Corporation)
Original Assignee
Sun Microsystems Incorporated (Oracle Corporation)
Inventors
Woods, William A.
Primary Examiner(s)
AMSBURY, WAYNE P

Application Number

US08/499,268
Time in Patent Office

970 Days
Field of Search

395/605, 395/600
US Class Current

1/1
CPC Class Codes

G06F 16/24578   using ranking

G06F 16/30   of unstructured textual dat...

G06F 16/31   Indexing; Data structures t...

G06F 16/334   Query execution G06F16/335 ...

G06F 16/93   Document management systems

G06F 16/951   Indexing; Web crawling tech...

Y10S 707/917   Text

Y10S 707/99932   Access augmentation or opti...

Y10S 707/99933   Query processing, i.e. sear...

Y10S 707/99935   Query augmenting and refini...

Y10S 707/99936   Pattern matching access

Method and apparatus for generating query responses in a computer-based document retrieval system

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

414 Citations

13 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for generating query responses in a computer-based document retrieval system

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

414 Citations

13 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links