Method and apparatus for generating query responses in a computer-based document retrieval system
First Claim
1. A method for locating information in documents in a database stored in a memory coupled to a processor, the method being carried out by program steps executed by said processor, including the steps of:
- (1) receiving a search query including at least one query term;
(2) generating at least one hit passage from said documents, said hit passage including at least one hit term corresponding to said at least one query term;
(3) for at least a first hit term and a second hit term corresponding, respectively, to at least a first query term and a second query term, determining a first distance between said first and second hit terms and a second distance between said first and second query terms;
(4) generating a factor having a magnitude based upon a comparison of said first distance with said second distance; and
(5) generating a score for said hit passage incorporating the magnitude of said factor;
wherein said hit passage has a size based upon a size of said search query.
3 Assignments
0 Petitions
Accused Products
Abstract
The present invention relates to a method and apparatus for generating responses to queries to a document retrieval system. The system responds to a specific request for information by locating and ranking portions of text that may contain the information sought. It locates small relevant passages of text (called "hit passages") and ranks them according to an estimate of the degree to which they correspond to the information sought. The system minimizes the number of these hit passages that need to be examined before an information seeker has either found the desired information or can safely conclude that the information sought is not in the collection of texts. A relaxation ranking mechanism is provided to accommodate paraphrase variations that occur between the description of the information sought and the content of the text passages that may constitute suitable answers, by retrieving phrases that are dissimilar to the query phrase to different degrees according to a predefined set of rules, and penalizing the retrieved phrases based upon the degree of this dissimilarity, thus providing the user with a priority organized query hit list.
414 Citations
13 Claims
-
1. A method for locating information in documents in a database stored in a memory coupled to a processor, the method being carried out by program steps executed by said processor, including the steps of:
-
(1) receiving a search query including at least one query term; (2) generating at least one hit passage from said documents, said hit passage including at least one hit term corresponding to said at least one query term; (3) for at least a first hit term and a second hit term corresponding, respectively, to at least a first query term and a second query term, determining a first distance between said first and second hit terms and a second distance between said first and second query terms; (4) generating a factor having a magnitude based upon a comparison of said first distance with said second distance; and (5) generating a score for said hit passage incorporating the magnitude of said factor; wherein said hit passage has a size based upon a size of said search query. - View Dependent Claims (5, 6, 7, 8)
-
-
2. A method for locating information in documents in a database stored in a memory coupled to a processor, the method being carried out by program steps executed by said processor, including the steps of:
-
(1) receiving a search query including at least one query term; (2) generating at least one hit passage from said documents, said hit passage including at least one hit term corresponding to said at least one query term; (3) for at least a first hit term and a second hit term corresponding, respectively, to at least a first query term and a second query term, determining a first distance between said first and second hit terms and a second distance between said first and second query terms, (4) generating a factor having a magnitude based upon a comparison of said first distance with said second distance; and (5) generating a score for said hit passage incorporating the magnitude of said factor; wherein said score is additionally based upon a penalty generated from a measure of a semantic similarity between at least one query term and at least one hit term.
-
-
3. A method for locating information in documents in a database stored in a memory coupled to a processor, the method being carried out by program steps executed by said processor, including the steps of:
-
(1) receiving a search query including at least one query term; (2) generating at least one hit passage from said documents, said hit passage including at least one hit term corresponding to said at least one query term; (3) for at least a first hit term and a second hit term corresponding, respectively, to at least a first query term and a second query term, determining a first distance between said first and second hit terms and a second distance between said first and second query terms; (4) generating a factor having a magnitude based upon a comparison of said first distance with said second distance; and (5) generating a score for said hit passage incorporating the magnitude of said factor; wherein said score is additionally based upon a penalty generated from a comparison of the total number of query terms with the total number of hit terms.
-
-
4. A method for locating information in documents in a database stored in a memory coupled to a processor, the method being carried out by program steps executed by said processor, including the steps of:
-
(1) receiving a search query including at least one query term; (2) generating at least one hit passage from said documents, said hit passage including at least one hit term corresponding to said at least one query term; (3) for at least a first hit term and a second hit term corresponding, respectively, to at least a first query term and a second query term, determining a first distance between said first and second hit terms and a second distance between said first and second query terms; (4) generating a factor having a magnitude based upon a comparison of said first distance with said second distance; and (5) generating a score for said hit passage incorporating the magnitude of said factor; further including the step of providing at least one hyperlink in said retrieved passage, said hyperlink linked to the document containing said passage.
-
-
9. A method for locating information in documents in a database stored in a memory coupled to a processor, the method being carried out by program steps executed by said processor, including the steps of:
-
(1) receiving a search query including at least a first query term and a second query term in a first order; (2) generating at least one hit passage from said documents, said hit passage including at least a first hit term corresponding to said first query term and a second hit term corresponding to said second query term, said first and second hit terms being in a second order; (3) generating a factor having a magnitude based upon a comparison of said first order with said second order; and (4) generating a score for said hit passage incorporating the magnitude of said factor; wherein said score is additionally based upon a penalty generated from a measure of a semantic similarity between at least one query term and at least one hit term.
-
-
10. A method for locating information in documents in a database stored in a memory coupled to a processor, the method being carried out by program steps executed by said processor, including the steps of:
-
(1) receiving a search query including at least a first query term and a second query term in a first order; (2) generating at least one hit passage from said documents, said hit passage including at least a first hit term corresponding to said first query term and a second hit term corresponding to said second query term, said first and second hit terms being in a second order; (3) generating a factor having a magnitude based upon a comparison of said first order with said second order; and (4) generating a score for said hit passage incorporating the magnitude of said factor; wherein said score is additionally based upon a penalty generated from a comparison of the total number of query terms with the total number of hit terms.
-
-
11. A method for locating information in documents in a database stored in a memory coupled to a processor of a computer system, the computer system further including a proximity buffer and an output buffer coupled to said processor, the method being carried out by program steps executed by said processor and including the steps of:
-
(1) receiving a search query including at least one query term; (2) determining at least one target region of at least one said document in said database; (3) setting a penalty threshold to a predefined maximum; (4) determining query hits corresponding to said at least one query term within said target region and correlating with each said query hit a score reflecting how closely it corresponds to its corresponding query term; (5) storing said query hits in said proximity buffer; (6) designating a best-scoring query hit from said proximity buffer as a current query hit; (7) if said output buffer is full, discarding a lowest-scored query hit; (8) inserting said current query hit into said output buffer; (9) if the output buffer is now full, setting said penalty threshold to the score of a lowest-scored query hit in the output buffer; (10) if a predetermined criterion is met, then proceeding to step 13 and otherwise proceeding to step 11; (11) if there are more entailing term hits to generate, then proceeding to step 12 and otherwise proceeding to step 13; (12) repositioning the target region relative to said document, and proceeding to step 4; and (13) returning the contents of the output buffer. - View Dependent Claims (12, 13)
-
Specification