Systems and methods for data conversion and comparison
First Claim
1. A database system comprising:
- at least one processor; and
at least one non-transient storage medium containing instructions that, when executed by the at least one processor, cause the at least one processor to;
translate input data in a first format into a canonical format, wherein translation includes operations to;
map a plurality of individual data elements of the input data to a plurality of respective canonical data types associated with determined data types of the individual data elements;
encode the plurality of individual data elements into a byte stream comprising at least;
a canonical type byte based on the mapping for each individual data element; and
at least one data value for data of each individual data element where present; and
wherein the encoding includes generation of a hybrid encoding for floating point numbers, wherein the hybrid encoding further comprises a decimal continuation marker for encoding decimal numbers; and
execute data comparison operations against the byte stream in response to at least some requests for database operations received from at least some database clients.
1 Assignment
0 Petitions
Accused Products
Abstract
According to one embodiment, a translation component is configured to operate on document encoded data to translate the document encoded data into a canonical format comprising a plurality of canonical types that fold together into a byte stream. The translation component is configured to accept any storage format of data (e.g., column store, row store, LSM tree, etc. and/or data from any storage engine, WIREDTIGER, MMAP, AR tree, Radix tree, etc.) and translate that data into a byte stream to enable efficient comparison. When executing searches and using the translated data to provide comparisons there is necessarily a trade-off based on the cost of translating the data and how much the translated data can be leveraged to increase comparison efficiency.
-
Citations
16 Claims
-
1. A database system comprising:
-
at least one processor; and at least one non-transient storage medium containing instructions that, when executed by the at least one processor, cause the at least one processor to; translate input data in a first format into a canonical format, wherein translation includes operations to; map a plurality of individual data elements of the input data to a plurality of respective canonical data types associated with determined data types of the individual data elements; encode the plurality of individual data elements into a byte stream comprising at least; a canonical type byte based on the mapping for each individual data element; and at least one data value for data of each individual data element where present; and wherein the encoding includes generation of a hybrid encoding for floating point numbers, wherein the hybrid encoding further comprises a decimal continuation marker for encoding decimal numbers; and execute data comparison operations against the byte stream in response to at least some requests for database operations received from at least some database clients. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer implemented method for managing a distributed database, the method comprising:
-
translating, by at least one processor, input data in a first format into a canonical format, translation comprising; analyzing, by the at least one processor, a plurality of individual data elements in the first format to determine data types associated with respective individual data elements; mapping, by the at least one processor, the plurality of individual data elements of the input data to a plurality of respective canonical data types associated with the determined data types of the individual data elements; encoding, by the at least one processor, the plurality of individual data elements into a byte stream comprising at least;
a canonical type byte based on the mapping for each individual data element, and at least one data value for data of each individual data element where present; andwherein the encoding includes generation of a hybrid encoding for floating point numbers, wherein the hybrid encoding further comprises a decimal continuation marker for encoding decimal numbers; and executing data comparison operations against the byte stream in response to at least some requests for database operations received from at least some database clients. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
Specification