Block Entropy Encoding for Word Compression
First Claim
Patent Images
1. A computer-implemented method for compressing a list of words, the method comprising:
- parsing all words to create a symbol list;
identifying substrings in each word by decomposing the word based on the symbol list;
encoding the identified substrings.
2 Assignments
0 Petitions
Accused Products
Abstract
A computer-implemented method, computer-readable media, and a computerized system to compress words are provided. The computerized system includes a compression engine that compresses a list of words. The compression engine generates a symbol list from the list of words, decomposes the words using the symbol list and a cost function, and encodes the decomposed words. The words may be from a search index. The compression engine may be utilized to reduce the size of the search index and improve efficiency.
-
Citations
18 Claims
-
1. A computer-implemented method for compressing a list of words, the method comprising:
-
parsing all words to create a symbol list; identifying substrings in each word by decomposing the word based on the symbol list; encoding the identified substrings. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. One or more computer-readable media storing instructions to perform a method for locating common n-grams in a list of terms, the method comprising:
-
generating n-grams for each word in the list; calculating occurrence counts of each n-gram; determining a cost of n-gram selection as a function of n-gram statistics that include the occurrence count of each n-gram; and selecting n-grams that minimize the cost of n-gram selection as a function of n-gram statistics. - View Dependent Claims (8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A computerized search system, the search system comprising:
a compression engine configured to provide access to a compressed search index, wherein the compression engine receives a search request, decomposes search terms in the search request based on a symbol list associated with the search index, and encodes the search request in accordance with the symbol list. - View Dependent Claims (17, 18)
Specification