Methods and system for using web browser to search large collections of documents
First Claim
1. A database file structure for locating symbols within a text document comprising:
- a hash table comprising a plurality of buckets wherein each of said plurality of buckets points to a variable length list of symbol entries for which the associated symbol hashes to a hash value corresponding to the bucket;
a variable length list of at least one symbol entry, distinct from said plurality of buckets, pointed to by at least one of said plurality of buckets wherein each of said at least one symbol entry points to a variable length list of file index entries each corresponding to a text document in which a corresponding symbol is found;
a variable length list of at least one file index entry, distinct from said plurality of buckets and distinct from said variable length list of at least one symbol entry, pointed to by one of said at least one symbol entry wherein each of said at least one file index entry points to a variable length list of line number entries each corresponding to a line number at which said corresponding symbol is located in the corresponding text document; and
a variable length list of at least one line number entry, distinct from said plurality of buckets and distinct from said variable length list of at least one symbol entry and distinct from said variable length list of at least one file index entry, pointed to by one of said at least one file index entries wherein each of said at least one line number entry provides a location in a text document at which the corresponding symbol is found in the corresponding text document.
4 Assignments
0 Petitions
Accused Products
Abstract
A system for rapidly and easily searching large collections of documents using standard web browser programs as the user interface. The present invention parses a collection of text documents to identify symbols therein and builds a database file which identifies the file and line locations of each symbol identified. The database file is constructed to permit rapid searching for symbols to permit interactive use of the present invention as a search tool. A database client process interacts with the web browser via standard CGI techniques to convert browser commands and queries into appropriate server process requests. A server process receives such requests and manipulates the database files in response to the requests. Query results returned to the client process are then reformatted by the client process to return a document with hypertext links in place of search keys located in the database (e.g., an HTML page). The system of the present invention thereby provides for rapid searching of large collections of text documents which is not coupled to a specific toolset used to create any one of the documents and which uses a simple and well-known user interface, namely: web browsers.
257 Citations
2 Claims
-
1. A database file structure for locating symbols within a text document comprising:
-
a hash table comprising a plurality of buckets wherein each of said plurality of buckets points to a variable length list of symbol entries for which the associated symbol hashes to a hash value corresponding to the bucket; a variable length list of at least one symbol entry, distinct from said plurality of buckets, pointed to by at least one of said plurality of buckets wherein each of said at least one symbol entry points to a variable length list of file index entries each corresponding to a text document in which a corresponding symbol is found; a variable length list of at least one file index entry, distinct from said plurality of buckets and distinct from said variable length list of at least one symbol entry, pointed to by one of said at least one symbol entry wherein each of said at least one file index entry points to a variable length list of line number entries each corresponding to a line number at which said corresponding symbol is located in the corresponding text document; and a variable length list of at least one line number entry, distinct from said plurality of buckets and distinct from said variable length list of at least one symbol entry and distinct from said variable length list of at least one file index entry, pointed to by one of said at least one file index entries wherein each of said at least one line number entry provides a location in a text document at which the corresponding symbol is found in the corresponding text document. - View Dependent Claims (2)
-
Specification