Transparent discovery of semi-structured data schema
First Claim
Patent Images
1. A method for managing semi-structured data comprising:
- receiving semi-structured data elements from a data source that is connected over a computer network;
performing statistical analysis on collections of the semi-structured data elements as they are added to the database via a computer processor, wherein separate collections comprising portions of the semi-structured data are stored in separate files having different subsets of the semi-structured data elements that have been extracted;
identifying common data elements from within the semi-structured data;
combining common data elements from the data source into separate pseudo-columns;
storing non-common semi-structured data elements in an overflow serialized column in computer memory; and
deriving metadata corresponding to the pseudo-columns of the common data elements from the statistical analysis.
2 Assignments
0 Petitions
Accused Products
Abstract
A system, apparatus, and method for managing data storage and data access for semi-structured data systems.
93 Citations
18 Claims
-
1. A method for managing semi-structured data comprising:
-
receiving semi-structured data elements from a data source that is connected over a computer network; performing statistical analysis on collections of the semi-structured data elements as they are added to the database via a computer processor, wherein separate collections comprising portions of the semi-structured data are stored in separate files having different subsets of the semi-structured data elements that have been extracted; identifying common data elements from within the semi-structured data; combining common data elements from the data source into separate pseudo-columns; storing non-common semi-structured data elements in an overflow serialized column in computer memory; and deriving metadata corresponding to the pseudo-columns of the common data elements from the statistical analysis. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system for aggregating semi-structured data comprising:
-
one or more processors; memory operably connected to the one or more processors; and the memory storing one or more modules programmed to; receive semi-structured data elements from a data source; perform statistical analysis on collections of the semi-structured data elements as they are added to the database, wherein separate collections comprising portions of the semi-structured data are stored in separate files having different subsets of the semi-structured data elements that have been extracted; identify common data elements from within the semi-structured data and combine the common data elements from the data source into separate pseudo-columns; store non-common semi-structured data elements in an overflow serialized column; and derive metadata corresponding to the pseudo-columns of the common data elements from the statistical analysis. - View Dependent Claims (11, 12, 13)
-
-
14. An apparatus for aggregating semi-structured data comprising:
-
one or more processors; memory operably connected to the one or more processors; and the memory storing; a receiving module configured to receive semi-structured data elements from a data source; a statistical module configured to perform statistical analysis on collections of the semi-structured data elements as they are added to the database, wherein separate collections comprising portions of the semi-structured data are stored in separate files having different subsets of the semi-structured data elements that have been extracted; an aggregation means for identifying common data elements from within the semi-structured data and combining the common data elements from the data source into separate pseudo-columns; the aggregation means further for serializing and storing non-common semi-structured data elements in an overflow serialized column; and the aggregation means further for deriving metadata corresponding to the pseudo-columns of the common data elements from the statistical analysis. - View Dependent Claims (15, 16, 17, 18)
-
Specification