Efficient method for compressing, storing, searching and transmitting natural language text
First Claim
1. A method for compressing text into a compressed file, comprising the steps of:
- parsing words from text in an input file;
comparing the parsed words to a first predetermined dictionary having a plurality of vocabulary words and corresponding numbers to the text;
determining which of the parsed words are not present in the first predetermined dictionary;
creating at least one supplemental dictionary including the parsed words not present in the first predetermined dictionary;
storing the first predetermined dictionary in a compressed file together with the supplemental dictionary;
replacing the parsed words with numbers corresponding to the numbers assigned in the first predetermined and supplemental dictionary; and
storing into the compressed file the numbers in place of the corresponding words of the text.
3 Assignments
0 Petitions
Accused Products
Abstract
A method for compressing text includes steps of parsing words from text in an input file and comparing the parsed words to a predetermined dictionary. The dictionary has a plurality of vocabulary words in it and numbers or tokens corresponding to each vocabulary word. A further step is determining which of the parsed words are not present in the predetermined dictionary and creating at least one supplemental dictionary including the parsed words that are not present in the predetermined dictionary. The predetermined dictionary and the supplemental dictionary are stored together in a compressed file. Also, the parsed words are replaced with numbers or tokens corresponding to the numbers assigned in the predetermined and supplemental dictionary and the numbers or tokens are stored in the compressed file.
-
Citations
9 Claims
-
1. A method for compressing text into a compressed file, comprising the steps of:
-
parsing words from text in an input file; comparing the parsed words to a first predetermined dictionary having a plurality of vocabulary words and corresponding numbers to the text; determining which of the parsed words are not present in the first predetermined dictionary; creating at least one supplemental dictionary including the parsed words not present in the first predetermined dictionary; storing the first predetermined dictionary in a compressed file together with the supplemental dictionary; replacing the parsed words with numbers corresponding to the numbers assigned in the first predetermined and supplemental dictionary; and storing into the compressed file the numbers in place of the corresponding words of the text. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method of uncompressing a compressed hypertext markup language file received from a remote server on the internet, comprising the steps of:
-
receiving at a system coupled to the internet a header of a compressed file, the compressed file including a plurality of tokens and a predetermined and supplemental dictionary that each relate tokens with corresponding text, and the header identifying the predetermined dictionary; analyzing the header of the compressed file to determine whether the predetermined dictionary is stored on the system; requesting transmission of the predetermined dictionary from the remote server when not present on the system; receiving the compressed text from the remote server and replacing each token within the compressed text with the corresponding text based on the content of the predetermined and supplemental dictionaries.
-
Specification