Real-time document collection search engine with phrase indexing
First Claim
1. A collection search system responsive to a user query against a collection of documents to provide a search report, said collection search system comprising:
- a) a collection index including first predetermined single words and first predetermined multiple word phrases as respectively indexed terms occurring in said collection of documents, said first predetermined multiple word phrases including occurrences of said first predetermined single words;
b) a linguistic parser that identifies a list of search terms from a user query, said linguistic parser identifying said list from second predetermined single words and second predetermined multiple word phrases; and
c) a search engine coupled to receive said list from said linguistic parser, said search engine intersecting said list with said collection index to identify a predetermined document from said collection of documents, said search engine including an accumulator for summing, for said predetermined document, a relevancy score related to the intersection of the indexed terms of said predetermined document with the search terms of said list,whereby said first predetermined single words of said predetermined document may contribute multiply to said accumulated relevancy score for said predetermined document.
4 Assignments
0 Petitions
Accused Products
Abstract
A collection search system is responsive to a user query against a collection of documents to provide a search report. The collection search system includes a collection index including first predetermined single word and multiple word phrases as indexed terms occurring in the collection of documents, a linguistic parser that identifies a list of search terms from a user query, the linguistic parser identifying the list from second predetermined single words and multiple word phrases, and a search engine coupled to receive the list from the linguistic parser. The search engine operates to intersect the list with the collection index to identify a predetermined document from the collection of documents. The search engine includes an accumulator for summing a relevancy score for the predetermined document that is related to the intersection of the predetermined document with the list.
814 Citations
11 Claims
-
1. A collection search system responsive to a user query against a collection of documents to provide a search report, said collection search system comprising:
-
a) a collection index including first predetermined single words and first predetermined multiple word phrases as respectively indexed terms occurring in said collection of documents, said first predetermined multiple word phrases including occurrences of said first predetermined single words; b) a linguistic parser that identifies a list of search terms from a user query, said linguistic parser identifying said list from second predetermined single words and second predetermined multiple word phrases; and c) a search engine coupled to receive said list from said linguistic parser, said search engine intersecting said list with said collection index to identify a predetermined document from said collection of documents, said search engine including an accumulator for summing, for said predetermined document, a relevancy score related to the intersection of the indexed terms of said predetermined document with the search terms of said list, whereby said first predetermined single words of said predetermined document may contribute multiply to said accumulated relevancy score for said predetermined document. - View Dependent Claims (2, 3)
-
-
4. A search system, operative against a plurality of collections of documents as stored by a plurality of collection servers, wherein said search system provides for the generation of a search report in response to a search query, said search system comprising:
-
a) a plurality of indexes storing predetermined informational terms in correspondence with the documents of said plurality of collections, said predetermined informational terms being selected subject to the exclusion of any term included in a list of informational terms; b) a query parser that derives a set of query terms from said search query, said set of query terms being selected subject to the exclusion of any term included in said list of informational terms; and c) a search engine that calculates the intersection between said predetermined informational terms for each of said plurality of indexes and said set of query terms to provide a document respective normalized score representation for each of the documents of each of said plurality of collections, said search engine evaluating said document respective normalized score representations to produce a ranked search report in response to said search query.
-
-
5. A search system providing for the evaluation of a search query against an indexed collection of documents, said search system comprising:
-
a) a first index part indexing word document terms; b) a second index part indexing phrase document terms; and c) a query processor to determine the intersection of a query, represented as a combination of word query terms and phrase query terms, with said first and second index parts, said query processor providing a ranking score for a predetermined document that corresponds to the number of query term intersections with said word and document terms. - View Dependent Claims (6)
-
-
7. A collection search system that is responsive to a query text provided in relation to a collection of documents and that provides a responsive search report, said collection search system comprising:
-
a) a collection index that includes a first list of word-terms with corresponding data storing location information sufficient to identify an occurrence location within a document of said collection, said first list of word-terms including both single words and multiple word phrases, wherein the multiple word phrases of said first list are word sequences that occur in said collection subject to the exclusion of a predetermined set of words, and wherein the words of the multiple word phrases of said first list are also included in said first list as single word word-terms; b) a parser coupled to receive said query text and responsively provide a second list of word-terms from said query text, wherein said second list includes both single words and multiple word phrases, wherein the multiple word phrases of said second list are word sequences that occur in said query text subject to the exclusion of said predetermined set of words; and c) a search engine coupled to receive said second list from said parser, said search engine intersecting said second list with said collection index to identify a predetermined document from said collection of documents, said search engine including an accumulator for summing, for said predetermined document, a relevancy score determined by the intersection of the word-terms of said first list corresponding to said predetermined document with the word-terms of said second list, whereby said relevancy score is weighted additionally by the occurrence of single words in both single word word-terms and multiple word phrase word-terms. - View Dependent Claims (8, 9, 10, 11)
-
Specification