Textual data storage system and method
First Claim
1. A hand-held electronic device for use in accessing and displaying data that is stored in compressed form, wherein when uncompressed, the data includes a series of words, and wherein each of the words is sized according to a multiple of a common unit of memory storage, the hand-held electronic device comprising:
- a display for displaying information;
a processor;
a memory;
tokenized data stored in the memory, wherein the tokenized data comprises word and phrase tokens, wherein each of the word tokens represents a unique word in the data, wherein each of the word tokens is sized according to the common unit of memory storage regardless of the size of the unique word, wherein each of the phrase tokens represent a unique sequence of the word tokens in the tokenized data, wherein the phrase tokens are associated to the unique sequence in response to locating at least one repeated unique sequence of word tokens in the tokenized data and wherein each of the phrase tokens is sized according to a given multiple of the common unit of memory storage;
a word dictionary table-stored in the memory, wherein the one word dictionary comprises the word tokens and their corresponding unique words; and
a phrase dictionary stored in the memory, wherein the phrase dictionary table comprises the phrase tokens and their corresponding word tokens;
wherein a data access routine stored in the memory and executable by the processor is operable to receive an input, and responsive to the input, display a portion of the data by decompressing the tokenized data using the word and phrase dictionaries.
2 Assignments
0 Petitions
Accused Products
Abstract
A hand-held electronic device for use in accessing and displaying data that includes a series of words. The hand-held electronic device includes a display, processor, and memory. Stored in the memory are tokenized data, and word and phrase dictionaries. The tokenized data comprises word and phrase tokens. Each word token represents a unique word in the data. Each phrase token represents a unique sequence of the word tokens and is associated to the unique sequence in response to locating repeated unique sequences in the tokenized data. The word dictionary comprises the word tokens and their corresponding unique words, and the phrase dictionary comprises the phrase tokens and their corresponding word tokens. A data access routine stored in the memory and executable by the processor is operable to display a portion of the data by decompressing the tokenized data using the word and phrase dictionaries.
83 Citations
42 Claims
-
1. A hand-held electronic device for use in accessing and displaying data that is stored in compressed form, wherein when uncompressed, the data includes a series of words, and wherein each of the words is sized according to a multiple of a common unit of memory storage, the hand-held electronic device comprising:
-
a display for displaying information;
a processor;
a memory;
tokenized data stored in the memory, wherein the tokenized data comprises word and phrase tokens, wherein each of the word tokens represents a unique word in the data, wherein each of the word tokens is sized according to the common unit of memory storage regardless of the size of the unique word, wherein each of the phrase tokens represent a unique sequence of the word tokens in the tokenized data, wherein the phrase tokens are associated to the unique sequence in response to locating at least one repeated unique sequence of word tokens in the tokenized data and wherein each of the phrase tokens is sized according to a given multiple of the common unit of memory storage;
a word dictionary table-stored in the memory, wherein the one word dictionary comprises the word tokens and their corresponding unique words; and
a phrase dictionary stored in the memory, wherein the phrase dictionary table comprises the phrase tokens and their corresponding word tokens;
wherein a data access routine stored in the memory and executable by the processor is operable to receive an input, and responsive to the input, display a portion of the data by decompressing the tokenized data using the word and phrase dictionaries. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A machine-readable storage medium containing a data structure for housing data in a compressed form, wherein when uncompressed, the data includes a series of words, and wherein each of the words is sized according to a multiple of a common unit of memory storage, the data structure comprising:
-
tokenized data stored in the memory, wherein the tokenized data comprises word and phrase tokens, wherein each of the word tokens represents a unique word in the data, wherein each of the word tokens is sized according to the common unit of memory storage regardless of the size of the unique word, wherein each of the phrase tokens represent a unique sequence of the word tokens in the tokenized data, wherein the phrase tokens are associated to the unique sequence in response to locating at least one repeated unique sequence of word tokens in the tokenized data, and wherein each of the phrase tokens is sized according to a given multiple of the common unit of memory storage;
a word dictionary, wherein the word dictionary comprises The word tokens and their corresponding a unique words list; and
a phrase dictionary, wherein the phrase dictionary comprises the phrase tokens and their corresponding word tokens. - View Dependent Claims (10, 11, 12)
-
-
13. A system for accessing and displaying data that is stored in compressed form, wherein when uncompressed, the data includes a series of words, and wherein each of the words is sized according to a multiple of a common unit of memory storage, the system comprising:
-
a display for displaying information;
a processor;
a memory;
tokenized data stored in the memory, wherein the tokenized data comprises word and phrase tokens, wherein each of the word tokens represents a unique word in the data, wherein each of the word tokens is sized according to the common unit of memory storage regardless of the size of the unique word, wherein each of the phrase tokens represent a unique sequence of the word tokens in the tokenized data, wherein the phrase tokens are associated to the unique sequence in response to locating at least one repeated unique sequence of word tokens in the tokenized data, and wherein each of the phrase tokens is sized according to a given multiple of the common unit of memory storage;
a word dictionary stored in the memory, wherein the word dictionary comprises the word tokens and their corresponding unique words; and
a phrase dictionary stored in the memory, wherein the phrase dictionary comprises the phrase tokens and their corresponding word tokens;
wherein a data access routine stored in the memory and executable by the processor is operable to receive an input, and in responsive to the input, displaying a portion of the data by decompressing the tokenized data using the word and phrase dictionaries. - View Dependent Claims (14, 15, 16, 17, 18, 19)
-
-
20. A method for storing in memory data in compressed form, wherein the data includes a series of words, and wherein each word of the series of words is sized according to a multiple of a common unit of memory storage, the method comprising:
-
(a) associating a word token to each unique word of the data, wherein each word token is sized according to the common unit of memory storage regardless of the size of the unique word;
(b) storing in the memory a word dictionary, wherein the word dictionary comprises each unique word and its associated word token;
(c) converting each of the series of words in the data into a series of word tokens so as to produce tokenized data, wherein each of the series of word tokens corresponds to one of the word tokens in the word dictionary;
(d) associating a phrase token to each repeated phrase in the tokenized data;
wherein each of the repeated phrases comprises a sequence of the word tokens in the tokenized data, and wherein each phrase token is sized according to a given multiple of the common unit of memory storage;
(e) storing in the memory a phrase dictionary, wherein the phrase dictionary comprises each repeated phrase and its associated phrase token;
(f) converting each repeated phrase of the tokenized data into its associated phrase token; and
(g) storing in memory the tokenized data, whereby the tokenized data comprises less common units of memory storage than the series of words when (i) at least one of the words is sized larger than its associated word token, and (ii) when the tokenized data comprises at least one repeated phrase. - View Dependent Claims (21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
-
-
31. A method for storing in memory data in compressed form, wherein the data includes a series of words, and wherein each word of the series of words is sized according to a multiple of a common unit of memory storage, the method comprising:
-
(a) associating a word token to each unique word of the data, wherein each word token is sized according to the common unit of memory storage regardless of the size of the unique word;
(b) storing in the memory a word dictionary, wherein the word dictionary comprises each unique word and its associated word token;
c) converting each of the series of words in the data into a series of word tokens so as to produce tokenized data, wherein each of the series of word tokens corresponds to one of the word tokens in the word dictionary;
(d) determining a compression-efficient-phrase length for repeated phrases in the tokenized data, wherein the compression-efficient-phrase length allows for efficient compression of the tokenized data, and wherein each of the repeated phrases comprises a sequence of the word tokens in the tokenized data;
(e) associating a phrase token to each repeated phrase having the compression-efficient-phrase length;
wherein each phrase token is sized according to a given multiple of the common unit of memory storage;
(f) storing in the memory a phrase dictionary, wherein the phrase dictionary comprises (i) each repeated phrase having the compression-efficient-phrase length and (ii) the phrase token associated with each repeated phrase having the compression-efficient-phrase length;
(g) converting each repeated phrase of the tokenized data having the compression-efficient-phrase length into its associated phrase token; and
(h) storing in memory the tokenized data, whereby the tokenized data comprises less common units of memory storage than the series of words when (i) at least one of the words is sized larger than its associated word token, and (ii) when the tokenized data comprises at least one repeated phrase. - View Dependent Claims (32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42)
-
Specification