Method and apparatus for compressing a dictionary database by partitioning a master dictionary database into a plurality of functional parts and applying an optimum compression technique to each part
First Claim
1. A computer implemented method for the compression of dictionary database information comprising the steps of:
- forming a first part database from a master dictionary database consisting of all the entry points in the dictionary wherein each of these entry points is associated with a unique word number;
forming a second part database from said master database consisting of a multiplicity of placeholders, each placeholder corresponding to one of the following;
an entry, an inflection, a definition, a pronunciation, a part of speech, and a usage note;
forming a third part database from said master database consisting of all the entry points of the dictionary in the exact order in which they appear in the dictionary;
forming a fourth part database from said master database consisting of the definitions and usage notes without reference to their context;
from a plurality of compression techniques, selecting and applying to each said first part, second part, third part and fourth part a compression technique, the compression technique selected for each said database part being determined by which technique provides an optimum degree of compression to the database information in said part; and
storing said compressed database information.
9 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus for compressing dictionary database information is described. The method divides the database information into a number of parts which are each conducive to a predetermined compression technique. A first part database is formed consisting of all the entry points in the dictionary wherein each entry point is associated with a unique word number. A second part database is formed consisting of a multiplicity of placeholders. A third part database is formed consisting of all the entry points of the dictionary in the exact order in which they appear in the dictionary. A fourth part database is formed consisting of the definitions and usage notes without reference to their text. A fifth part database allows retrieval of articles of interest without having to decompress the entire dictionary. Compression techniques using multigrams and minimum-redundancy codes are selectively applied to the different database parts.
60 Citations
12 Claims
-
1. A computer implemented method for the compression of dictionary database information comprising the steps of:
-
forming a first part database from a master dictionary database consisting of all the entry points in the dictionary wherein each of these entry points is associated with a unique word number; forming a second part database from said master database consisting of a multiplicity of placeholders, each placeholder corresponding to one of the following;
an entry, an inflection, a definition, a pronunciation, a part of speech, and a usage note;forming a third part database from said master database consisting of all the entry points of the dictionary in the exact order in which they appear in the dictionary; forming a fourth part database from said master database consisting of the definitions and usage notes without reference to their context; from a plurality of compression techniques, selecting and applying to each said first part, second part, third part and fourth part a compression technique, the compression technique selected for each said database part being determined by which technique provides an optimum degree of compression to the database information in said part; and storing said compressed database information. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer implemented method for the compression and decompression of dictionary database information the steps of:
-
forming a first part database from a master dictionary database consisting of all the entry points in the dictionary wherein each of these entry points is associated with a unique word number; forming a second part database from said master database consisting of a multiplicity of placeholders, each placeholder corresponding to one of the following;
an entry, an inflection, a definition, a pronunciation, a part of speech, a usage note or the like;forming a third part database from said master database consisting of all the entry points of the dictionary in the exact order in which they appear in the dictionary; forming a fourth part database from said master database consisting of the definitions and usage notes without reference to their context; from a plurality of compression techniques, selecting and applying to each said first part, second part, third part and fourth part a compression technique, the compression technique selected for each said database part being determined by which technique provides an optimum degree of compression to the database information of said part; and storing said compressed data base information, and the further steps of; decompressing the compressed information of each database part for a predetermined article of interest; and merging the decompressed data base parts for said article of interest.
-
-
9. The use of a dictionary database compressed by forming a first part database from a master dictionary database consisting of all the entry points in the dictionary wherein each of these entry points is associated with a unique word number;
-
forming a second part database from said master database consisting of a multiplicity of placeholders, each placeholder corresponding to one of the following;
an entry, an inflection, a definition, a pronunciation, a part of speech, a usage note or the like;forming a third part database from said master database consisting of all the entry points of the dictionary in the exact order in which they appear in the dictionary; forming a fourth part database from said master database consisting of the definitions and usage notes without reference to their context; and from a plurality of compression techniques, selecting and applying to each said first part, second part, third part and fourth part a compression technique, the compression technique selected for each said database part being determined by which technique provides an optimum degree of compression to the database information of that part; storing said compressed information; and comprising a fifth part database from said third part database consisting of a dictionary index to facilitate the retrieval of articles of interest, without, after compression, having to decompress the entire dictionary, wherein the data of said third part database is organized into ranges of word numbers and wherein the master database is divided into a predetermined number of parts called "pages" separated by page breaks, said second, third and fourth database parts having page breaks which match those of the master database;
said use comprising the steps of;making a query for a particular entry word; determining the equivalent word number from the first part data base; decompressing the page or pages where this entry occurs; scanning the third part database for the location of the query; initializing a counter by the first entry on a chosen page, said counter also accumulating subsequent values; comparing the value of said counter at each successive step with the work number corresponding to the query; monitoring the number of comparisons made by a second counter and providing an offset, each entry being uniquely identified by page number and offset; and utilizing the page number and offset in the second part database to find a placeholder for the article of interest and using the other placeholders in the article of interest to determine the offsets of the related entry words and definitions so that the entire article of interest corresponding to said entry work may be obtained.
-
-
10. A method for the compression of dictionary database information in a microprocessor controlled, electronic reference device having a master dictionary database stored in a read-only memory, comprising the steps of:
-
forming a first part database from a master dictionary database consisting of all the entry points in the dictionary wherein each of these entry points is associated with a unique word number; forming a second part database from said master database consisting of a multiplicity of placeholders, each placeholder corresponding to one of the following;
an entry, an inflection, a definition, a pronunciation, a part of speech, a usage note or the like;forming a third part database from said master database consisting of all the entry points of the dictionary in the exact order in which they appear in the dictionary; forming a fourth part database from said master database consisting of the definitions and usage notes without reference to their context; from a plurality of compression techniques, selecting and applying to each said first part, second part, third part and fourth part a compression technique, the compression technique selected for each said database part being determined by which technique provides a high degree of compression to the database information in said part and; storing said compressed information.
-
-
11. Apparatus for compression of dictionary database information comprising:
-
first means for forming a first part database from a master dictionary database consisting of all of the entry points in the dictionary wherein each of these entry points is associated with a unique word number; second means for forming a second part database from said master database consisting of a multiplicity of placeholders, each placeholder corresponding to one of the following;
an entry, an inflection, a definition, a pronunciation, a part of speech, a usage note or the like;third means for forming a third part database from said master database notes consisting of all entry points of the dictionary in the exact order in which they appear in the dictionary; fourth means for forming a fourth part database from said master database consisting of the definitions and usage notes without reference to their context; and means coupled to said first means, second means, third means and fourth means for from a plurality of compression techniques, selecting and applying to each said first part, second part, third part and fourth part a compression technique, the compression technique selected for each said database part being determined by which technique provides an optimum degree of compression to the database information in said part.
-
-
12. In electronic reference apparatus having a microprocessor;
- a read-only memory, a random-access memory, a keyboard and a display, an improvement for compression of dictionary database information, said read-only memory having a master dictionary database stored therein, the improvement comprising;
first means for forming a first part database from said master dictionary database consisting of all of the entry points in the dictionary wherein each of these entry points is associated with a unique word number; second means for forming a second part database from said master database consisting of a multiplicity of placeholders, each placeholder corresponding to one of the following;
an entry, an inflection, a definition, a pronunciation, a part of speech, a usage note or the like;third means for forming a third part database from said master database notes consisting of all entry points of the dictionary in the exact order in which they appear in the dictionary; fourth means for forming a fourth part database from said master database consisting of the definitions and usage notes without reference to their context; and means forming part of said microprocessor and coupled to said first means, second means, third means and fourth means for, from a plurality of compression techniques, selecting and applying a compression technique to each said first part, second part, third part and fourth part, the compression technique selected for each said database part being determined by which technique provides a high degree of compression to the database information of said part.
- a read-only memory, a random-access memory, a keyboard and a display, an improvement for compression of dictionary database information, said read-only memory having a master dictionary database stored therein, the improvement comprising;
Specification