Data record compression with progressive and/or selective decomposition
First Claim
1. A computer-implemented method comprising:
- compressing, by a computer associated with a database, one or more data elements using a compression technique, the compression technique compressing the data elements such that each respective data element is individually decompressed when the data element is returned in response to a search query;
storing, by the computer, the one or more data elements into a database record comprising one or more data fields, wherein each respective compressed data element is stored into a data field of the record configured to store a type of data element of the respective data element;
associating, by the computer, a field notation in a reference table with each of the one or more data fields in each of one or more data records according to a schema associated with the database, wherein the field notation identifies a data type for each respective data field;
responsive to the computer receiving a search query requesting a set of one or more data elements stored in one or more records of the database;
querying, by the computer, the database for the set of one or more data elements satisfying the search query; and
decompressing, by the computer, using the compression technique each respective data element in the set of one or more data elements satisfying the search query.
3 Assignments
0 Petitions
Accused Products
Abstract
Disclosed herein are systems and methods for compressing structured or semi-structured data in a horizontal manner achieving compression ratios similar to vertical compression. Collections include structured or semi-structured data include a number of fields and are described using a schema. Fields include information having semantic similarity and are compressed using methods suitable for compressing the type of data. Data of a collection is compressed after fragmentation or may be normalized prior to compression. Data with semantic similarity is compressed using token tables and/or n-gram tables, where higher weighted, consisting of the product of frequency and length, occurring values may be stored in the lower numbered indices of the data table. Records include record descriptor bytes, field descriptor bytes, zero or more array descriptor bytes, zero or more object descriptor bytes, or bytes representing the data associated with the record. Data is indexed or compressed by a suitable module.
-
Citations
36 Claims
-
1. A computer-implemented method comprising:
-
compressing, by a computer associated with a database, one or more data elements using a compression technique, the compression technique compressing the data elements such that each respective data element is individually decompressed when the data element is returned in response to a search query; storing, by the computer, the one or more data elements into a database record comprising one or more data fields, wherein each respective compressed data element is stored into a data field of the record configured to store a type of data element of the respective data element; associating, by the computer, a field notation in a reference table with each of the one or more data fields in each of one or more data records according to a schema associated with the database, wherein the field notation identifies a data type for each respective data field; responsive to the computer receiving a search query requesting a set of one or more data elements stored in one or more records of the database; querying, by the computer, the database for the set of one or more data elements satisfying the search query; and decompressing, by the computer, using the compression technique each respective data element in the set of one or more data elements satisfying the search query. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A computing system comprising:
-
one or more nodes storing one or more collections of a database, each collection comprising a set of one or more records of the database, and each record comprising a set of one or more data fields storing one or more data elements respectively; and a compression processor configured to compress the one or more data elements stored in one or more data fields of one or more records of a collection using a compression technique based on the collection, associate a field notation in a reference table with each of the one or more data fields in each of the one or more records according to a schema associated with the database, wherein the field notation identifies a data type for each respective data field, and decompress a set of one or more data elements satisfying a search query. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36)
-
Specification