Systems and methods for indexing information for a search engine
First Claim
1. A computer system for forming an index of an information repository, said computer system comprising:
- an Acquisitioner processing module that locates a plurality of word processing documents within the repository;
a Formatter processing module that refines the word processing documents located by the Acquisitioner module wherein said refining includes removing meta tags from said word processing documents; and
an Indexer processing module that forms a numerical matrix corresponding to the refined word processing documents, wherein the numerical matrix is the index and the numerical matrix includes substantially all content information corresponding to said word processing documents and wherein said numerical matrix stores all content information in the exact order of appearance of the content within the word processing documents.
0 Assignments
0 Petitions
Accused Products
Abstract
Embodiments of the invention form an information set from the current set of index information available by the operations of the pre-search and runtime Search components of the search engine. A search request that contains search terms and/or other search criteria (e.g. date or file type) is entered by a user through an input interface. The search terms and the information set are worked through the search engine modules to provide the actual results sought by the user. These results are provided to the user via an output interface. Embodiments involve scanning the repository for documents that comprise at least one information type, and forming a numerical matrix from the scanned documents.
55 Citations
17 Claims
-
1. A computer system for forming an index of an information repository, said computer system comprising:
-
an Acquisitioner processing module that locates a plurality of word processing documents within the repository; a Formatter processing module that refines the word processing documents located by the Acquisitioner module wherein said refining includes removing meta tags from said word processing documents; and an Indexer processing module that forms a numerical matrix corresponding to the refined word processing documents, wherein the numerical matrix is the index and the numerical matrix includes substantially all content information corresponding to said word processing documents and wherein said numerical matrix stores all content information in the exact order of appearance of the content within the word processing documents. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A method for forming an index of an information repository, wherein the method operates on a computer system, and comprises:
-
locating a plurality of documents within the repository; refining the documents; and forming a numerical matrix from the refined documents, wherein the numerical matrix is the index, wherein said forming further comprises assigning an integer to at least one term unit to indicate its position within a particular document; wherein the index is usable by a search tool that compares the index with a search request from a user and wherein said numerical matrix contains all of the content information of a particular document in the order of appearance in said particular document. - View Dependent Claims (16)
-
-
17. A computer program product having a computer-readable storage medium having computer program logic recorded thereon for forming an index of an information repository, the computer program product comprising:
-
means for scanning the repository for unstructured documents that comprise at least one information type; means for forming a numerical matrix from the scanned unstructured documents, wherein the numerical matrix is the index and wherein the means for forming comprises; means for forming a matrix of term units, wherein each term unit is a set of characters that is separated by a space from another term unit, wherein said matrix of term units contains all of the content information of a particular unstructured document in the order of appearance in said particular unstructured document; and means for assigning an integer to at least one term unit to indicate its position within the document.
-
Specification