Code book construction for variable to variable length entropy encoding
First Claim
1. A method of constructing a code book for groupings of symbols drawn from an alphabet, in which variable size symbol groupings are each assigned a variable length code based on probability of occurrence of symbol groupings, comprising:
- receiving a series of symbols from an input;
storing in a data structure plural variable size symbol groupings, each symbol grouping defined by one or more contiguous symbols, wherein parsing the received series of symbols according to the symbol groupings yields probabilities of occurrence for the symbol groupings throughout the received series, and wherein the parsing repeats plural times, the symbol groupings changing after each parsing based on said probabilities of occurrence throughout the received series;
assigning a variable length code for each symbol grouping based on a probability of occurrence of such symbol grouping in the received series; and
outputting a code book that associates the symbol groupings with corresponding assigned codes for subsequent variable-to-variable compression.
2 Assignments
0 Petitions
Accused Products
Abstract
A method of constructing a code book for groupings of symbols drawn from an alphabet, in which variable-sized groups of symbols are each assigned a variable length code based on probability of occurrence of symbol groupings. Code book entries are added by tentatively extending the high probability groupings with symbols from the alphabet. Code book size is restrained by identification of identify high probability symbol groupings, such that low probability groupings are combined into a single code book entry. Probability of occurrence for each entry is tracked. Extension and combination is repeated until a code book of predetermined size is reached. Each code book entry is assigned an entropy-type code according to the probability associated with each book entry.
-
Citations
28 Claims
-
1. A method of constructing a code book for groupings of symbols drawn from an alphabet, in which variable size symbol groupings are each assigned a variable length code based on probability of occurrence of symbol groupings, comprising:
-
receiving a series of symbols from an input;
storing in a data structure plural variable size symbol groupings, each symbol grouping defined by one or more contiguous symbols, wherein parsing the received series of symbols according to the symbol groupings yields probabilities of occurrence for the symbol groupings throughout the received series, and wherein the parsing repeats plural times, the symbol groupings changing after each parsing based on said probabilities of occurrence throughout the received series;
assigning a variable length code for each symbol grouping based on a probability of occurrence of such symbol grouping in the received series; and
outputting a code book that associates the symbol groupings with corresponding assigned codes for subsequent variable-to-variable compression. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
forming an initial collection of shorter groupings;
selecting a shorter grouping;
defining longer groupings by extending the shorter grouping with extension symbols;
computing a probability of receiving each longer grouping from the input, so as to identify high probability and low probability groupings; and
combining each low probability grouping into a combined grouping having the symbol series of the shorter grouping, and an expansion symbol representing any extension series except those of each high probability grouping.
-
-
6. A computer readable medium having stored thereon computer executable instructions for performing the method of claim 5.
-
7. The method of claim 1 wherein the symbols are stored in a hierarchical memory storage and adaptively grouped, a hierarchy in the storage having an initial hierarchy level containing initial groupings, where all groupings are defined by a symbol series of one or more symbols from the alphabet, and the hierarchy having sub-levels containing at least one sub-grouping for a parent grouping, each sub-grouping defined by extending the symbol series for the parent with one or more symbols from the alphabet, and wherein the storing comprises:
-
creating a new sub-level of a previous level in the hierarchy;
selecting a parent grouping from the previous hierarchy level;
defining an extension set of symbols having plural elements, each element defined by at least one symbol from the alphabet;
for each element of the extension set, defining a sub-group of the parent grouping by extending the symbol series for the parent grouping with such set element;
computing, for each defined sub-group, a probability of receiving such sub-group from the input;
identifying at least one high probability sub-group having a symbol series defined by a base symbol series common to each subgroup and a high-probability extension series of at least one symbol, where all other sub-groups are low probability; and
combining each low probability sub-group into a combined grouping having a combined symbol series represented by the base symbol series and a special symbol representing any symbol series different from each high-probability extension series.
-
-
8. A computer readable medium having stored thereon computer executable instructions for performing the method of claim 7.
-
9. The method of claim 1 wherein the data structure stores n variable size symbol groupings, and wherein symbol groupings not represented in the data structure are encoded with an escape entry.
-
10. A method for creating a code book for compressing streaming data, such streaming data being represented by a series of symbols which are collected into variable length groupings of symbols that are each assigned a variable length code based on probability of occurrence such groupings in a typical data source, the method comprising:
-
storing such symbol groupings in a data structure having a dynamically allocable memory for storing the symbol groupings, where the data structure has different entropy codes assigned to each of a plurality of different input sequences;
identifying a symbol grouping having high probability;
expanding the symbol grouping into tentative groupings by adding a new symbol from the alphabet;
identifying a high probability tentative grouping;
combining all tentative groupings, except for the high probability tentative grouping, into a combined grouping;
storing the high probability tentative grouping in the data structure; and
storing the combined grouping in the data structure. - View Dependent Claims (11, 12, 13, 14)
breaking apart the grouping into constituent symbols;
identifying a constituent symbol having high probability; and
re-combining all constituent symbols except for the high probability constituent symbol.
-
-
14. The method of claim 10, wherein the streaming data is retrieved from a non-volatile storage medium.
-
15. A system for constructing a code book for groupings of symbols drawn from an alphabet, in which variable size symbol groupings are each assigned a variable length code based on probability of occurrence of symbol groupings, comprising:
-
an input for receiving a series of plural symbols;
a memory configured to contain a data structure for storing plural variable size symbol groupings, each symbol grouping defined by one or more contiguous symbols;
means for parsing a received series of plural symbols according to the symbol groupings, wherein the parsing yields probabilities of occurrence for the symbol groupings throughout the received series, and wherein the parsing repeats plural times, the symbol groupings changing after each parsing based on said probabilities of occurrence throughout the received series;
means for assigning a variable length code for each symbol grouping based on a probability of occurrence of such symbol grouping in the received series; and
means for outputting a code book that associates the variable size symbol groupings with the assigned variable length codes for subsequent variable-to-variable compression. - View Dependent Claims (16, 17)
-
-
18. A method of constructing a code book for groupings of symbols drawn from an alphabet, in which variable size symbol groupings are each assigned a variable length code based on probability of occurrence of symbol groupings, comprising:
-
receiving symbols from an input;
storing in a data structure variable size symbol groupings defined by contiguous symbols received from the input, wherein the storing comprises;
identifying a symbol grouping having high probability;
expanding the symbol grouping into tentative groupings by adding a new symbol from the alphabet;
identifying a high probability tentative grouping;
combining all tentative groupings, except for the high probability tentative grouping, into a combined grouping;
storing the high probability tentative grouping in the data structure; and
storing the combined grouping in the data structure;
calculating a probability for each of the symbol groupings; and
assigning a variable length code for each symbol grouping.- View Dependent Claims (19, 20, 21, 22, 23)
wherein the high probability tentative grouping is formed by the symbol grouping having high probability, followed by a most probable extension, and wherein the combined grouping is formed by the symbol grouping having high probability, followed by a special extension corresponding to any extension different than the most probable extension. -
22. A method according to claim 21, wherein a combined grouping is stored within a data structure having a dynamically allocable memory for storing each symbol within the grouping.
-
23. A computer-readable medium storing computer-executable programming for performing the method of claim 18.
-
-
24. A system for constructing a code book for groupings of symbols drawn from an alphabet, in which variable size symbol groupings are each assigned a variable length code based on probability of occurrence of symbol groupings, comprising:
-
an input for receiving plural symbols;
a memory configured to contain a data structure for storing variable size symbol groupings deformed by contiguous symbols received from the input;
means for identifying a symbol grouping having high probability;
means for expanding the symbol grouping into tentative groupings by adding a new symbol from the alphabet;
means for identifying a high probability tentative grouping;
means for combining all tentative groupings, except for the high probability tentative grouping, into a combined grouping;
means for storing the high probability tentative grouping in the data structure;
means for storing the combined grouping in the data structure;
means for calculating a probability for each of the symbol groupings; and
means for assigning a variable length code for each symbol grouping. - View Dependent Claims (25)
-
-
26. A computer readable medium having stored thereon computer executable instructions for constructing a code book for groupings of symbols drawn from an alphabet, the code book for variable-to-variable compression, the instructions for performing acts comprising:
-
initializing symbol groupings for parsing a series of symbols received from an input;
repeatedly, parsing the received series of symbols according to the symbol groupings, wherein the parsing yields probabilities of occurrence for the symbol groupings throughout the received series; and
changing the symbol groupings based on said probabilities of occurrence throughout the received series;
assigning a variable length code for each symbol grouping based upon a probability of occurrence for such symbol grouping in the received series; and
outputting a code book that associates the symbol groupings with corresponding assigned codes for subsequent variable-to-variable compression. - View Dependent Claims (27, 28)
-
Specification