Real-time document collection search engine with phrase indexing
First Claim
1. A collection search system responsive to a user query against a collection of documents to provide a search report, said collection search system comprising:
- a) a collection index including first predetermined single word and multiple word phrases as indexed terms occurring in said collection of documents;
b) a linguistic parser that identifies a list of search terms from a user query, said linguistic parser identifying said list from second predetermined single words and multiple word phrases; and
c) a search engine coupled to receive said list from said linguistic parser, said search engine intersecting said list with said collection index to identify a predetermined document from said collection of documents, said search engine including an accumulator for summing for said predetermined document a relevancy score related to the intersection of said predetermined document with said list using conditional indexing of said word phrases.
4 Assignments
0 Petitions
Accused Products
Abstract
A collection search system is responsive to a user query against a collection of documents to provide a search report. The collection search system includes a collection index including first predetermined single word and multiple word phrases as indexed terms occurring in the collection of documents, a linguistic parser that identifies a list of search terms from a user query, the linguistic parser identifying the list from second predetermined single words and multiple word phrases, and a search engine coupled to receive the list from the linguistic parser. The search engine operates to intersect the list with the collection index to identify a predetermined document from the collection of documents. The search engine includes an accumulator for summing a relevancy score for the predetermined document that is related to the intersection of the predetermined document with the list.
443 Citations
12 Claims
-
1. A collection search system responsive to a user query against a collection of documents to provide a search report, said collection search system comprising:
-
a) a collection index including first predetermined single word and multiple word phrases as indexed terms occurring in said collection of documents; b) a linguistic parser that identifies a list of search terms from a user query, said linguistic parser identifying said list from second predetermined single words and multiple word phrases; and c) a search engine coupled to receive said list from said linguistic parser, said search engine intersecting said list with said collection index to identify a predetermined document from said collection of documents, said search engine including an accumulator for summing for said predetermined document a relevancy score related to the intersection of said predetermined document with said list using conditional indexing of said word phrases. - View Dependent Claims (2, 3, 4)
-
-
5. A collection search system responsive to a user query against a collection of documents to provide a search report, said collection search system comprising:
-
a) a collection index including first predetermined single word and multiple word phrases as indexed terms occurring in said collection of documents, said collection index storing one-word and two-word indexes of the frequency of occurrence of search phrases; b) a linguistic parser that identifies a list of search terms from a user query, said linguistic parser identifying said list from second predetermined single words and multiple word phrases; and c) a search engine coupled to receive said list from said linguistic parser, said search engine intersecting said list with said collection index to identify a predetermined document from said collection of documents, said search engine including an accumulator for summing for said predetermined document a relevancy score related to the intersection of said predetermined document with said list. - View Dependent Claims (6)
-
-
7. A collection search system responsive to a user query against a collection of documents to provide a search report, said collection search system comprising:
-
a) a collection index including first predetermined single word and multiple word phrases as indexed terms occurring in said collection of documents, wherein said collection index is distributed in different subgroups which share common group statistics; b) a linguistic parser that identifies a list of search terms from a user query, said linguistic parser identifying said list from second predetermined single words and multiple word phrases; and c) a search engine coupled to receive said list from said linguistic parser, said search engine intersecting said list with said collection index to identify a predetermined document from said collection of documents said search engine including an accumulator for summing for said predetermined document a relevancy score related to the intersection of said predetermined document with said list. - View Dependent Claims (8)
-
-
9. A collection search system responsive to a user query against a collection of documents to provide a search report, said collection search system comprising:
-
a) a distributed collection index including first predetermined one-word and two-word phrases as indexed terms occurring in said collection of documents; b) a linguistic parser that identifies a list of search terms from a user query, said linguistic parser identifying said list from second predetermined single words and multiple word phrases; and c) a search engine coupled to receive said list from said linguistic parser, said search engine intersecting said list with said collection index to identify a predetermined document from said collection of documents, said search engine including an accumulator for summing for said predetermined document a relevancy score related to the intersection of said predetermined document with said list.
-
-
10. A collection search system responsive to a user query, Q, for searching against a collection of documents,
where each document is described as a string of words, S, where for an M-word document, S=S1, S2,S3, . . . ,Sm,Sm+1,Sm+2,Sm+3, . . . ,SM, where m=1,2, . . . , M, where for a number, Ng, of documents in a group g of documents in the collection, the documents are designated S1g,S2g, . . . ,Sng, . . . ,SNg for ng =1,2, . . . , Ng, and where a typical predetermined document, Sng, of said documents in the group g is given by, Sng =Sng,1,Sng,2,Sng,3, . . . ,Sng,m3 Sng,m+1,Sng,m+2,Sng,m+3, . . . ,Sng,Mm, said collection search system comprising: -
a collection index including first predetermined one-word and two-word phrases as indexed terms occurring in said collection of documents; a linguistic parser that identifies from the user query, Q, a list of search terms Q1, Q2, . . . , Qp, . . . , Qp each having a weighted value, WQ1,WQ2, . . . ,WQp3 . . . , WQP, said linguistic parser identifying said list of search terms from second predetermined one-word and two-word phrases; and a search engine coupled to receive said list from said linguistic parser, said search engine intersecting said list with said collection index to identify the predetermined document Sng from said collection of documents, said search engine including an accumulator for summing for said predetermined document a relevancy score, score(Sng)Q, for the document Sng based on the query Q as follows;
##EQU5## where (ANg)Qp =a document value relative to the number of occurrences of the term Qp in the documents in the group g,where WQp =the value of the query term relative to the number of occurrences of the term Qp in the particular document Sng. - View Dependent Claims (11, 12)
-
Specification