Systems, methods, and computer program products for fast and scalable proximal search for search queries
First Claim
Patent Images
1. A method of information retrieval from multiple documents, comprising:
- splitting each document into multiple snippets of words;
generating a separate index for each snippet;
receiving an input search query including at least one sentence; and
processing the search query against each separate index of each snippet of the multiple snippets by searching query terms over each of the multiple snippets to implicitly introduce term proximity information in the information retrieval, wherein processing the search query further comprises;
creating an OR-Query of all non-stopwords in each sentence;
returning a fit value for each OR-Query, wherein a fit value represents a similarity metric that measures the amount of word content overlap between two text units; and
aggregating the fit values to provide a score for every document returned by the OR-Queries.
0 Assignments
0 Petitions
Accused Products
Abstract
Embodiments of the invention provide a method and computer program products for information retrieval from multiple documents by proximity searching for search queries. A method includes generating an index for the multiple documents, wherein the index includes words in snippets in the documents. An input search query is processed against the index by searching query terms over the snippets to introduce term proximity information implicitly in the information retrieval. Results of multiple sentence level search operations are combined as output.
-
Citations
11 Claims
-
1. A method of information retrieval from multiple documents, comprising:
-
splitting each document into multiple snippets of words; generating a separate index for each snippet; receiving an input search query including at least one sentence; and processing the search query against each separate index of each snippet of the multiple snippets by searching query terms over each of the multiple snippets to implicitly introduce term proximity information in the information retrieval, wherein processing the search query further comprises; creating an OR-Query of all non-stopwords in each sentence; returning a fit value for each OR-Query, wherein a fit value represents a similarity metric that measures the amount of word content overlap between two text units; and aggregating the fit values to provide a score for every document returned by the OR-Queries. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
Specification