Method and system for data retrieval in large collections of data
First Claim
1. A method for retrieving information using a search engine comprising the steps of:
- (a) retrieving a document to be indexed;
(b) generating a document extract corresponding to the document;
(c) decomposing the document extract into a plurality of tokens; and
(d) storing the plurality of tokens in a search index, wherein the search engine accesses the search index to retrieve information in one or more document extracts satisfying a search query.
1 Assignment
0 Petitions
Accused Products
Abstract
A method, system and computer readable medium for retrieving relevant data in large collections of documents is disclosed. The method, system and computer readable medium of the present invention includes retrieving a document to be indexed, generating a document extract from the document, wherein the document extract comprises a portion of the document, and decomposing the document extract into tokens. The tokens are then stored in a search index, wherein a search engine accesses the search index to retrieve information satifying a search query.
Through aspects of the method, system and computer readable medium of the present invention, the quality of the search result is improved because the retrieved documents are more relevant in view of the semantic concept or notion represented by the search query. Moreover the storage requirements are reduced, while expediting the processing time for conducting a search.
-
Citations
23 Claims
-
1. A method for retrieving information using a search engine comprising the steps of:
-
(a) retrieving a document to be indexed;
(b) generating a document extract corresponding to the document;
(c) decomposing the document extract into a plurality of tokens; and
(d) storing the plurality of tokens in a search index, wherein the search engine accesses the search index to retrieve information in one or more document extracts satisfying a search query. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 11, 12, 13, 14, 15, 18, 19, 20, 21, 22, 23)
-
-
9. A computer readable medium containing programming instructions for retrieving information using a search engine comprising the instructions for:
-
(a) retrieving a document to be indexed;
(b) generating a document extract corresponding to the document;
(c) decomposing the document extract into a plurality of tokens; and
(d) storing the plurality of tokens in a search index, wherein the search engine accesses the search index to retrieve information in one or more document extracts satisfying a search query. - View Dependent Claims (10, 16)
-
-
17. A system for retrieving information, wherein the system includes a search engine comprising:
-
means for retrieving a document from a document repository;
an information extractor coupled to the means for retrieving, wherein the information extractor generates a document extract corresponding to the document;
a storage device coupled to the information extractor for storing the document extract;
a search engine indexer coupled to the storage device for decomposing the document extract into a plurality of tokens; and
a search index coupled to the search engine indexer for storing the plurality of tokens, wherein the search engine accesses the search index to retrieve information in one or more document extracts satisfying a search query.
-
Specification