Systems, methods and computer program products for fast and scalable proximal search for search queries
First Claim
1. A computer program product for information retrieval from multiple documents, the computer program product comprising a tangible storage medium readable by a computer system and storing instructions for execution by the computer system for performing a method comprising:
- splitting each document into multiple snippets of words;
indexing each snippet as a separate document;
receiving an input search query including at least one sentence;
processing the search query against the indexes of each of the multiple snippets by searching query terms over each of the multiple snippets to implicitly introduce term proximity information in the information retrieval;
decomposing the search query into sub-queries;
processing each sub-query against the indexes of each of the multiple snippets, sentence by sentence, using all words in each sentence of the sub-query to create an OR-Query of all non-stopwords in the sentence;
returning a fit value for each OR-Query; and
aggregating the fit values to provide a score for every document returned by the OR-Queries.
1 Assignment
0 Petitions
Accused Products
Abstract
Embodiments of the invention provide a system, method and computer program products for information retrieval from multiple documents by proximity searching for search queries. A method includes generating an index for the multiple documents, wherein the index includes words in snippets in the documents. An input search query is processed against the index by searching query terms over the snippets to introduce term proximity information implicitly in the information retrieval. Results of multiple sentence level search operations are combined as output.
-
Citations
10 Claims
-
1. A computer program product for information retrieval from multiple documents, the computer program product comprising a tangible storage medium readable by a computer system and storing instructions for execution by the computer system for performing a method comprising:
-
splitting each document into multiple snippets of words; indexing each snippet as a separate document; receiving an input search query including at least one sentence; processing the search query against the indexes of each of the multiple snippets by searching query terms over each of the multiple snippets to implicitly introduce term proximity information in the information retrieval; decomposing the search query into sub-queries; processing each sub-query against the indexes of each of the multiple snippets, sentence by sentence, using all words in each sentence of the sub-query to create an OR-Query of all non-stopwords in the sentence; returning a fit value for each OR-Query; and aggregating the fit values to provide a score for every document returned by the OR-Queries. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. An information retrieval system, comprising:
-
an indexer configured for splitting each document into multiple snippets of words and indexing each snippet as a separate document; and a searching module configured for receiving an input search query and searching the query against the indexes of each of the multiple snippets by searching query terms over each of the multiple snippets to implicitly introduce term proximity information in the information retrieval; wherein the searching module is further configured to; search the query against the indexes of each of the multiple snippets, sentence by sentence, using all words in each sentence to create an OR-Query of all non-stopwords in the sentence, and return a fit value for each OR-Query, wherein a fit value represents a similarity metric that measures the amount of word content overlap between two text units; and the system further comprises an aggregator configured to aggregate the fit values to provide a score for every document returned by the OR-Queries, and a query module configured to decompose the search query into sub-queries. - View Dependent Claims (9, 10)
-
Specification