Method and apparatus for generating query responses in a computer-based document retrieval system
First Claim
1. A method for locating information in documents in a database stored in a memory coupled to a processor, the method being carried out by program steps executed by said processor, including the steps of:
- (1) receiving a search query including at least one query term;
(2) generating at least one hit passage from said documents, said hit passage including at least one hit term corresponding to said at least one query term;
(3) for at least a first hit term and a second hit term corresponding, respectively, to at least a first query term and a second query term, determining a first distance between said first and second hit terms and a second distance between said first and second query terms;
(4) generating a factor having a magnitude based upon a comparison of said first distance with said second distance; and
(5) generating a score for said hit passage incorporating the magnitude of said factor.
0 Assignments
0 Petitions
Accused Products
Abstract
The present mechanism relates to a method and apparatus for generating responses to queries to a document retrieval system. The system responds to a specific request for information by locating and ranking portions of text that may contain the information sought. It locates small relevant passages of text (called “hit passages”) and ranks them according to an estimate of the degree to which they correspond to the information sought. The system minimizes the number of these hit passages that need to be examined before an information seeker has either found the desired information or can safely conclude that the information sought is not in the collection of texts. A relaxation ranking mechanism is provided to accommodate paraphrase variations that occur between the description of the information sought and the content of the text passages that may constitute suitable answers, by retrieving phrases that are dissimilar to the query phrase to different degrees according to a predefined set of rules, and penalizing the retrieved phrases based upon the degree of this dissimilarity, thus providing the user with a priority organized query hit list.
-
Citations
10 Claims
-
1. A method for locating information in documents in a database stored in a memory coupled to a processor, the method being carried out by program steps executed by said processor, including the steps of:
-
(1) receiving a search query including at least one query term;
(2) generating at least one hit passage from said documents, said hit passage including at least one hit term corresponding to said at least one query term;
(3) for at least a first hit term and a second hit term corresponding, respectively, to at least a first query term and a second query term, determining a first distance between said first and second hit terms and a second distance between said first and second query terms;
(4) generating a factor having a magnitude based upon a comparison of said first distance with said second distance; and
(5) generating a score for said hit passage incorporating the magnitude of said factor. - View Dependent Claims (2, 3, 4, 5, 6)
step 2 includes the step of generating a plurality of said hit passages;
step 3 is carried out for at least two said hit terms in each of said plurality of hit passages;
step 4 is carried out for each of said distances determined in step 3 for each set of corresponding hit terms and query terms; and
step 5 is carried out for said plurality of hit passages.
-
-
3. The method of claim 2, further including the steps of:
-
after step 5, determining a best-scored said hit passage; and
retrieving at least said best-scored hit passage.
-
-
4. The method of claim 2, further including the steps of:
-
after step 5, determining a best-scored said hit passage; and
retrieving at least a document containing said best-scored hit passage.
-
-
5. The method of claim 1, wherein said score is generated at least in part based upon a factor proportional to said first distance.
-
6. The method of claim 3, including the step of providing at least one hyperlink in said retrieved passage, said hyperlink linked to the document containing said passage.
-
7. A method for locating information in documents in a database stored in a memory coupled to a processor, the method being carried out by program steps executed by said processor, including the steps of:
-
(1) receiving a search query including at least a first query term and a second query term in a first order;
(2) generating at least one hit passage from said documents, said hit passage including at least a first hit term corresponding to said first query term and a second hit term corresponding to said second query term, said first and second hit terms being in a second order;
(3) generating a factor having a magnitude based upon a comparison of said first order with said second order; and
(4) generating a score for said hit passage incorporating the magnitude of said factor.
-
-
8. A method for locating information in documents in a database stored in a memory coupled to a processor of a computer system, the computer system further including a proximity buffer and an output buffer coupled to said processor, the method comprising:
-
(1) retrieving a search query including at least one query term;
(2) determining at least one target region of at least one said document in said database;
(3) setting a penalty threshold to a predetermined maximum;
(4) determining query hits corresponding to said query terms within said target region and correlating with each said query hit a score reflecting how closely it corresponds to its corresponding query term;
(5) storing said query hits in said proximity buffer;
(6) designating a best-scoring query hit from said proximity buffer as a current query hit;
(7) if said output buffer is full, discarding a lowest-scored query hit;
(8) inserting said current query hit into said output buffer;
(9) if said output buffer is now full, setting said penalty threshold to the score of a lowest-scored query hit in the output buffer; and
(10) returning the contents of the output buffer if a predetermined criterion is met. - View Dependent Claims (9, 10)
-
Specification