Method and system for compressing data
First Claim
Patent Images
1. A method for compressing data, comprising:
- receiving at least one data string, the at least one data string comprising a plurality of substrings, each substring comprising a plurality of characters;
identifying a first substring in the at least one data string, the first substring comprising a plurality of characters that are the same as a plurality of characters of a second substring in the at least one data string;
generating, by one or more processors, a refer-back token associated with the second substring, the refer-back token indicating a position of the first substring within a token string, the token string being a compressed version of at least a portion of the at least one data string, the position indicated by the refer-back token expressed as an offset to a position of the refer-back token in the token string, the refer-back token further indicating a length of the first substring within the token string, the refer-back token including a header that specifies a number of bits used to store the offset expressed by the position indicated by the refer-back token;
placing the first substring and the refer-back token into the token string, the token string allowing the second substring to be reconstructed by accessing the refer-back token, moving to the position in the token string that is indicated by the refer-back token, and reading an amount of data according to the length indicated by the refer-back token;
identifying a third substring in the at least one data string, the third substring comprising a plurality of characters that are the same as a plurality of characters of a fourth substring in the at least one data string;
generating a second refer-back token associated with the third substring, the second refer-back token indicating a position of the fourth substring within the token string, the position indicated by the second refer-back token expressed as an offset to a position of the second refer-back token in the token string, the second refer-back token further indicating a length of the fourth substring within the token string, the second refer-back token including a second header that specifies a second number of bits used to store the offset expressed by the position indicated by the second refer-back token, the second number of bits different from the first number of bits; and
placing the fourth substring and the second refer-back token into the token string, the token string allowing the third substring to be reconstructed by accessing the second refer-back token, moving to the position in the token string that is indicated by the second refer-back token, and reading an amount of data according to the length indicated by the second refer-back token.
2 Assignments
0 Petitions
Accused Products
Abstract
The present disclosure is directed to a method and system for compressing data. In accordance with a particular embodiment of the present disclosure, at least one data string is received. The at least one data string includes characters. A token string corresponding to the at least one data string is generated. At least one repeated substring in the at least one data string is identified. A refer-back token associated with the at least one repeated substring is generated. The refer-back token indicates a position of the at least one repeated substring and a length of the at least one repeated substring.
19 Citations
20 Claims
-
1. A method for compressing data, comprising:
-
receiving at least one data string, the at least one data string comprising a plurality of substrings, each substring comprising a plurality of characters; identifying a first substring in the at least one data string, the first substring comprising a plurality of characters that are the same as a plurality of characters of a second substring in the at least one data string; generating, by one or more processors, a refer-back token associated with the second substring, the refer-back token indicating a position of the first substring within a token string, the token string being a compressed version of at least a portion of the at least one data string, the position indicated by the refer-back token expressed as an offset to a position of the refer-back token in the token string, the refer-back token further indicating a length of the first substring within the token string, the refer-back token including a header that specifies a number of bits used to store the offset expressed by the position indicated by the refer-back token; placing the first substring and the refer-back token into the token string, the token string allowing the second substring to be reconstructed by accessing the refer-back token, moving to the position in the token string that is indicated by the refer-back token, and reading an amount of data according to the length indicated by the refer-back token; identifying a third substring in the at least one data string, the third substring comprising a plurality of characters that are the same as a plurality of characters of a fourth substring in the at least one data string; generating a second refer-back token associated with the third substring, the second refer-back token indicating a position of the fourth substring within the token string, the position indicated by the second refer-back token expressed as an offset to a position of the second refer-back token in the token string, the second refer-back token further indicating a length of the fourth substring within the token string, the second refer-back token including a second header that specifies a second number of bits used to store the offset expressed by the position indicated by the second refer-back token, the second number of bits different from the first number of bits; and placing the fourth substring and the second refer-back token into the token string, the token string allowing the third substring to be reconstructed by accessing the second refer-back token, moving to the position in the token string that is indicated by the second refer-back token, and reading an amount of data according to the length indicated by the second refer-back token. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system for compressing data, comprising:
-
a processor; and a storage device embodying a program of instructions operable, when executed on the processor, to; receive at least one data string, the at least one data string comprising a plurality of characters; identify a first substring in the at least one data string, the first substring comprising a plurality of characters that are the same as a plurality of characters of a second substring in the at least one data string; generate a refer-back token associated with the second substring, the refer-back token indicating a position of the first substring within a token string, the token string being a compressed version of at least a portion of the at least one data string, the position indicated by the refer-back token expressed as an offset to a position of the refer-back token in the token string, the refer-back token further indicating a length of the first substring within the token string, the refer-back token including a header that specifies a number of bits used to store the offset expressed by the position indicated by the refer-back token; place the first substring and the refer-back token into the token string, the token string allowing the second substring to be reconstructed by accessing the refer-back token, moving to the position in the token string that is indicated by the refer-back token, and reading an amount of data according to the length indicated by the refer-back token; identify a third substring in the at least one data string, the third substring comprising a plurality of characters that are the same as a plurality of characters of a fourth substring in the at least one data string; generate a second refer-back token associated with the third substring, the second refer-back token indicating a position of the fourth substring within the token string, the position indicated by the second refer-back token expressed as an offset to a position of the second refer-back token in the token string, the second refer-back token further indicating a length of the fourth substring within the token string, the second refer-back token including a second header that specifies a second number of bits used to store the offset expressed by the position indicated by the second refer-back token, the second number of bits different from the first number of bits; and place the fourth substring and the second refer-back token into the token string, the token string allowing the third substring to be reconstructed by accessing the second refer-back token, moving to the position in the token string that is indicated by the second refer-back token, and reading an amount of data according to the length indicated by the second refer-back token. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. Non-transitory computer readable media comprising logic, the logic being operable, when executed by a processor, to:
-
receive at least one data string, the at least one data string comprising a plurality of characters; identify a first substring in the at least one data string, the first substring comprising a plurality of characters that are the same as a plurality of characters of a second substring in the at least one data string; generate a refer-back token associated with the second substring, the refer-back token indicating a position of the first substring within a token string, the token string being a compressed version of at least a portion of the at least one data string, the position indicated by the refer-back token expressed as an offset to a position of the refer-back token in the token string, the refer-back token further indicating a length of the first substring within the token string, the refer-back token including a header that specifies a number of bits used to store the offset expressed by the position indicated by the refer-back token; place the first substring and the refer-back token into the token string, the token string allowing the second substring to be reconstructed by accessing the refer-back token, moving to the position in the token string that is indicated by the refer-back token, and reading an amount of data according to the length indicated by the refer-back token; identify a third substring in the at least one data string, the third substring comprising a plurality of characters that are the same as a plurality of characters of a fourth substring in the at least one data string; generate a second refer-back token associated with the third substring, the second refer-back token indicating a position of the fourth substring within the token string, the position indicated by the second refer-back token expressed as an offset to a position of the second refer-back token in the token string, the second refer-back token further indicating a length of the fourth substring within the token string, the second refer-back token including a second header that specifies a second number of bits used to store the offset expressed by the position indicated by the second refer-back token, the second number of bits different from the first number of bits; and place the fourth substring and the second refer-back token into the token string, the token string allowing the third substring to be reconstructed by accessing the second refer-back token, moving to the position in the token string that is indicated by the second refer-back token, and reading an amount of data according to the length indicated by the second refer-back token. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification