Implementation of semi-structured data as a first-class database element
First Claim
Patent Images
1. A method for storing semi-structured data comprising:
- receiving semi-structured data elements from a data source;
performing statistical analysis on collections of the semi-structured data elements as they are added to the database;
identifying common data elements from within the semi-structured data;
assigning the common data elements from within the semi-structured data as first class data and as lesser class data dependent on a threshold of commonality, wherein the threshold of commonality is based on how many times the data element appears in the semi-structured data;
extracting the common data elements from the data source and storing the common data elements separately in columnar format;
storing the first class data in cache memory in pseudo columns and making metadata and statistics corresponding to the pseudo-columns of the first class data elements available to a computer based query generator;
re-identifying common data elements within the semi-structured data and assigning additional common data elemments as first class data and saving the additional data elements in cache memory;
reconstructing semi-structured data back to an original form by combining the first class data elements and the lesser class data elements and the non-common data;
storing lesser class data in pseudo columns on disk storage; and
storing non-common semi-structured data elements in an overflow serialized column.
2 Assignments
0 Petitions
Accused Products
Abstract
A system, apparatus, and method for managing data storage and data access for semi-structured data systems.
73 Citations
11 Claims
-
1. A method for storing semi-structured data comprising:
-
receiving semi-structured data elements from a data source; performing statistical analysis on collections of the semi-structured data elements as they are added to the database; identifying common data elements from within the semi-structured data; assigning the common data elements from within the semi-structured data as first class data and as lesser class data dependent on a threshold of commonality, wherein the threshold of commonality is based on how many times the data element appears in the semi-structured data; extracting the common data elements from the data source and storing the common data elements separately in columnar format; storing the first class data in cache memory in pseudo columns and making metadata and statistics corresponding to the pseudo-columns of the first class data elements available to a computer based query generator; re-identifying common data elements within the semi-structured data and assigning additional common data elemments as first class data and saving the additional data elements in cache memory; reconstructing semi-structured data back to an original form by combining the first class data elements and the lesser class data elements and the non-common data; storing lesser class data in pseudo columns on disk storage; and storing non-common semi-structured data elements in an overflow serialized column. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A system for aggregating semi-structured data comprising computer processors, cache memory, disk storage, and computer instructions, wherein the computer instructions cause the system to:
-
receive semi-structured data elements from a data source; derive statistical analysis data corresponding to collections of the semi-structured data elements that is derived as the collections are added to the database; identify common data elements from within the semi-structured data and assign common data elements from within the semi-structured data as first class data and as lesser class data dependent on a threshold of commonality, wherein the threshold of commonality is based on how many times the data element appears in the semi-structured data; extract common data elements from the data source and store the common data elements separately in columnar format; store the first class data in cache memory in pseudo columns and make metadata and statistics of the pseudo-columns of the first class data elements available to a computer based query generator; re-identify common data elements within the semi-structured data and assign additional common data elements as first class data and save the additional data elements in cache memory; store lesser class data in pseudo columns on disk storage; and store non-common semi-structured data elements in an overflow serialized column; wherein semi-structured data is reconstructed to an original form having recombined first class data elements and lesser class data. - View Dependent Claims (7, 8)
-
-
9. Non-transitory computer readable storage media storing instructions that, when executed by one or more processors, cause the one or more processors to:
-
receive semi-structured data elements from a data source; derive statistical analysis data corresponding to collections of the semi-structured data elements that is derived as the collections are added to the database; identify common data elements from within the semi-structured data; assign common data elements as first class data and as lesser class data dependent on a threshold of commonality, wherein the threshold of commonality is based on how many times the data element appears in the semi-structured data; extract common data elements from the data source and store the common data separately in columnar format; store the first class data in cache memory in pseudo columns and make metadata and statistics corresponding to the pseudo columns of the first class data elements available to a computer based query generator; re-identify common data elements within the semi-structured data and assign additional common data elements as first class data and save the additional data elements in cache memory; store lesser class data in pseudo columns on disk storage; and store non-common semi-structured data elements in an overflow serialized column; wherein semi-structured data is reconstructed to an original form having recombined first class data elements and lesser class data. - View Dependent Claims (10, 11)
-
Specification