Method and apparatus for generating query responses in a computer-based document retrieval system
First Claim
1. A method for locating information in documents in a database stored in a memory coupled to a processor, the method being carried out by program steps executed by said processor, including the steps of:
- (1) receiving a search query including at least one query term;
(2) generating at least one hit passage from said documents, said hit passage including at least one hit term corresponding to said at least one query term;
(3) for at least a first hit term and a second hit term corresponding, respectively, to at least a first query term and a second query term, determining a first distance between said first and second hit terms and a second distance between said first and second query terms;
(4) generating a factor having a magnitude based upon a comparison of said first distance with said second distance; and
(5) generating a score for said hit passage incorporating the magnitude of said factor.
0 Assignments
0 Petitions
Accused Products
Abstract
The present invention relates to a method and apparatus for generating responses to queries to a document retrieval system. The system responds to a specific request for information by locating and ranking portions of text that may contain the information sought. It locates small relevant passages of text (called “hit passages”) and ranks them according to an estimate of the degree to which they correspond to the information sought. The system minimizes the number of these hit passages that need to be examined before an information seeker has either found the desired information or can safely conclude that the information sought is not in the collection of texts. A relaxation ranking mechanism is provided to accommodate paraphrase variations that occur between the description of the information sought and the content of the text passages that may constitute suitable answers, by retrieving phrases that are dissimilar to the query phrase to different degrees according to a predefined set of rules, and penalizing the retrieved phrases based upon the degree of this dissimilarity, thus providing the user with a priority organized query hit list.
179 Citations
21 Claims
-
1. A method for locating information in documents in a database stored in a memory coupled to a processor, the method being carried out by program steps executed by said processor, including the steps of:
-
(1) receiving a search query including at least one query term;
(2) generating at least one hit passage from said documents, said hit passage including at least one hit term corresponding to said at least one query term;
(3) for at least a first hit term and a second hit term corresponding, respectively, to at least a first query term and a second query term, determining a first distance between said first and second hit terms and a second distance between said first and second query terms;
(4) generating a factor having a magnitude based upon a comparison of said first distance with said second distance; and
(5) generating a score for said hit passage incorporating the magnitude of said factor. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method for locating information in documents in a database stored in a memory coupled to a processor, the method being carried out by program steps executed by said processor, including the steps of:
-
(1) receiving a search query including at least a first query term and a second query term in a first order;
(2) generating at least one hit passage from said documents, said hit passage including at least a first hit term corresponding to said first query term and a second hit term corresponding to said second query term, said first and second hit terms being in a second order;
(3) generating a factor having a magnitude based upon a comparison of said first order with said second order; and
(4) generating a score for said hit passage incorporating the magnitude of said factor. - View Dependent Claims (11, 12, 13, 14)
-
-
15. A method for locating information in documents in a database stored in a memory coupled to a processor of a computer system, the computer system further including a proximity buffer and an output buffer coupled to said processor, the method being carried out by program steps executed by said processor and including the steps of:
-
(1) receiving a search query including at least one query term;
(2) determining at least one target region of at least one said document in said database;
(3) setting a penalty threshold to a predefined maximum;
(4) determining query hits corresponding to said query terms within said target region and correlating with each said query hit a score reflecting how closely it corresponds to its corresponding query term;
(5) storing said query hits in said proximity buffer;
(6) designating a best-scoring query hit from said proximity buffer as a current query hit;
(7) if said output buffer is full, discarding a lowest-scored query hit;
(8) inserting said current query hit into said output buffer;
(9) if the output buffer is now full, setting said penalty threshold to the score of a lowest-scored query hit in the output buffer;
(10) if a predetermined criterion is met, then proceeding to step 13 and otherwise proceeding to step 11;
(11) if there are more entailing term hits to generate, then proceeding to step 12 and otherwise proceeding to step 13;
(12) repositioning the target region relative to said document, and proceeding to step 4; and
(13) returning the contents of the output buffer. - View Dependent Claims (16, 17)
-
-
18. A computer system for locating information in documents in a database stored in a memory coupled to a processor of said computer system, including:
-
a query module configured to receive a search query including a plurality of query terms;
a retrieval module configured to retrieve passages from said documents, each said passage including at least one hit term corresponding to at least one said query term;
a scoring module configured to generate scores for said passages based upon an order of occurrence of said query terms compared with an order of occurrence of hit terms appearing in said passages and corresponding to said query terms.
-
-
19. A search system for retrieving and ranking passages of documents in a database, including:
-
a retrieval module configured to retrieve passages from said documents in response to a search query including at least one query term, each said passage including at least one hit term corresponding to at least one said query term; and
a scoring module configured to generate scores for said passages based upon an order of occurrence of said query terms compared with an order of occurrence of hit terms appearing in said passages and corresponding to said query terms.
-
-
20. A computer system for locating information in documents in a database stored in a memory coupled to a processor of said computer system, including:
-
a query module configured to receive a search query including a plurality of query terms;
a retrieval module configured to retrieve at least one passage from said documents, said passage including at least two said hit terms corresponding to at least two said query terms;
a scoring module configured to generate scores for said passages based upon a factor having a magnitude incorporating a distance between said at least two said hit terms.
-
-
21. A search system for retrieving and ranking passages of documents in a database, including:
-
a retrieval module configured to retrieve at least a first said passage from said documents in response to a search query including a plurality of query terms, said passage including at least two said hit terms corresponding to at least two said query terms; and
a scoring module configured to generate scores for said passages based upon a factor having a magnitude incorporating a distance between said at least two said hit terms.
-
Specification