Implementation of semi-structured data as a first-class database element

US 10,108,686 B2
Filed: 10/20/2014
Issued: 10/23/2018
Est. Priority Date: 02/19/2014
Status: Active Grant

First Claim

Patent Images

1. A method for storing semi-structured data comprising:

receiving semi-structured data elements from a data source;

performing statistical analysis on collections of the semi-structured data elements as they are added to the database;

identifying common data elements from within the semi-structured data;

assigning the common data elements from within the semi-structured data as first class data and as lesser class data dependent on a threshold of commonality, wherein the threshold of commonality is based on how many times the data element appears in the semi-structured data;

extracting the common data elements from the data source and storing the common data elements separately in columnar format;

storing the first class data in cache memory in pseudo columns and making metadata and statistics corresponding to the pseudo-columns of the first class data elements available to a computer based query generator;

re-identifying common data elements within the semi-structured data and assigning additional common data elemments as first class data and saving the additional data elements in cache memory;

reconstructing semi-structured data back to an original form by combining the first class data elements and the lesser class data elements and the non-common data;

storing lesser class data in pseudo columns on disk storage; and

storing non-common semi-structured data elements in an overflow serialized column.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system, apparatus, and method for managing data storage and data access for semi-structured data systems.

73 Citations

11 Claims

1. A method for storing semi-structured data comprising:
- receiving semi-structured data elements from a data source;
  
  performing statistical analysis on collections of the semi-structured data elements as they are added to the database;
  
  identifying common data elements from within the semi-structured data;
  
  assigning the common data elements from within the semi-structured data as first class data and as lesser class data dependent on a threshold of commonality, wherein the threshold of commonality is based on how many times the data element appears in the semi-structured data;
  
  extracting the common data elements from the data source and storing the common data elements separately in columnar format;
  
  storing the first class data in cache memory in pseudo columns and making metadata and statistics corresponding to the pseudo-columns of the first class data elements available to a computer based query generator;
  
  re-identifying common data elements within the semi-structured data and assigning additional common data elemments as first class data and saving the additional data elements in cache memory;
  
  reconstructing semi-structured data back to an original form by combining the first class data elements and the lesser class data elements and the non-common data;
  
  storing lesser class data in pseudo columns on disk storage; and
  
  storing non-common semi-structured data elements in an overflow serialized column.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The method of claim 1, further comprising, identifying first class data elements that have fallen below the threshold of commonality and assign a lesser class to the identified data elements and remove from cache memory.
  - 3. The method of claim 1, wherein the threshold of commonality is further based on how often a data element is requested by a user.
  - 4. The method of claim 1, further comprising maintaining aggregated metadata with updates that represent current pseudo-column structures and contents.
  - 5. The method of claim 1, further comprising storing lesser class data elements in main memory.

6. A system for aggregating semi-structured data comprising computer processors, cache memory, disk storage, and computer instructions, wherein the computer instructions cause the system to:
- receive semi-structured data elements from a data source;
  
  derive statistical analysis data corresponding to collections of the semi-structured data elements that is derived as the collections are added to the database;
  
  identify common data elements from within the semi-structured data and assign common data elements from within the semi-structured data as first class data and as lesser class data dependent on a threshold of commonality, wherein the threshold of commonality is based on how many times the data element appears in the semi-structured data;
  
  extract common data elements from the data source and store the common data elements separately in columnar format;
  
  store the first class data in cache memory in pseudo columns and make metadata and statistics of the pseudo-columns of the first class data elements available to a computer based query generator;
  
  re-identify common data elements within the semi-structured data and assign additional common data elements as first class data and save the additional data elements in cache memory;
  
  store lesser class data in pseudo columns on disk storage; and
  
  store non-common semi-structured data elements in an overflow serialized column;
  
  wherein semi-structured data is reconstructed to an original form having recombined first class data elements and lesser class data.
- View Dependent Claims (7, 8)
- - 7. The system of claim 6, wherein the computer instructions further cause the system to identify first class data elements that have fallen below the threshold of commonality and assign a lesser class to the identified data elements and remove from cache memory.
  - 8. The system of claim 6, wherein the threshold of commonality is further based on how often a data element is requested by a user.

9. Non-transitory computer readable storage media storing instructions that, when executed by one or more processors, cause the one or more processors to:
- receive semi-structured data elements from a data source;
  
  derive statistical analysis data corresponding to collections of the semi-structured data elements that is derived as the collections are added to the database;
  
  identify common data elements from within the semi-structured data;
  
  assign common data elements as first class data and as lesser class data dependent on a threshold of commonality, wherein the threshold of commonality is based on how many times the data element appears in the semi-structured data;
  
  extract common data elements from the data source and store the common data separately in columnar format;
  
  store the first class data in cache memory in pseudo columns and make metadata and statistics corresponding to the pseudo columns of the first class data elements available to a computer based query generator;
  
  re-identify common data elements within the semi-structured data and assign additional common data elements as first class data and save the additional data elements in cache memory;
  
  store lesser class data in pseudo columns on disk storage; and
  
  store non-common semi-structured data elements in an overflow serialized column;
  
  wherein semi-structured data is reconstructed to an original form having recombined first class data elements and lesser class data.
- View Dependent Claims (10, 11)
- - 10. The non-transitory computer readable storage media of claim 9, wherein the instructions further cause the one or more processors to identify first class data elements that have fallen below the threshold of commonality and further assigns a lesser class to the identified data elements and remove from cache memory.
  - 11. The non-transitory computer readable storage media of claim 9, wherein the instructions further cause the one or more processors to aggregate metadata substantially continually with updates that represent current pseudo-column structures and contents.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Snowflake, Inc.
Original Assignee
Snowflake Computing Inc. (Snowflake, Inc.)
Inventors
Dageville, Benoit, Antonov, Vadim
Primary Examiner(s)
Beausoliel, Jr., Robert
Assistant Examiner(s)
Hoang, Hau H

Application Number

US14/518,913
Publication Number

US 20150234914A1
Time in Patent Office

1,464 Days
Field of Search
US Class Current
CPC Class Codes

A61F 5/566   Intra-oral devices

G06F 16/128   Details of file system snap...

G06F 16/148   File search processing

G06F 16/1827   Management specifically ada...

G06F 16/211   Schema design and management

G06F 16/221   Column-oriented storage; Ma...

G06F 16/2365   Ensuring data consistency a...

G06F 16/24532   of parallel queries

G06F 16/24545   Selectivity estimation or d...

G06F 16/24552   Database cache management

G06F 16/2456   Join operations

G06F 16/2471   Distributed queries

G06F 16/254   Extract, transform and load...

G06F 16/27   Replication, distribution o...

G06F 16/273   Asynchronous replication or...

G06F 16/283   Multi-dimensional databases...

G06F 16/951   Indexing; Web crawling tech...

G06F 16/9535   Search customisation based ...

G06F 16/9538   Presentation of query results

G06F 9/4881   Scheduling strategies for d...

G06F 9/5016 : the resource being the memory

G06F 9/5044 : considering hardware capabi...

G06F 9/5083 : Techniques for rebalancing ...

G06F 9/5088 : involving task migration

H04L 67/1095 : Replication or mirroring of...

H04L 67/1097 : for distributed storage of ...

H04L 67/568 : Storing data temporarily at...

View All

Implementation of semi-structured data as a first-class database element

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

73 Citations

11 Claims

Specification

Solutions

Use Cases

Quick Links

Implementation of semi-structured data as a first-class database element

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

73 Citations

11 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links