Storing semi-structured data

US 9,754,048 B1
Filed: 10/06/2014
Issued: 09/05/2017
Est. Priority Date: 10/06/2014
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

maintaining a plurality of schemas, wherein each schema is associated with one or more encoded data items stored in a first data format in a data item repository, wherein each encoded data item stores a respective value at each of one or more locations in the encoded data item, and wherein each schema maps each of the locations in the data items associated with the schema to a respective key to which the value stored at the location in the data items associated with the schema corresponds;

receiving a first semi-structured data item, wherein the first semi-structured data item is in a semi-structured data format, and wherein the first semi-structured data item comprises one or more first key/value pairs;

determining that i) a first subset of the first key/value pairs of the first semi-structured data item do not match any of the schemas in the plurality of schemas and that ii) a second subset of the first key/value pairs of the first semi-structured data item match a first schema of the plurality of schemas; and

in response to determining that i) a first subset of the first key/value pairs of the first semi-structured data item do not match any of the schemas in the plurality of schemas and that ii) a second subset of the first key/value pairs of the first semi-structured data item match a first schema of the plurality of schemas;

generating a new schema that i) for a first subset of locations in a data item associated with the new schema, maps the locations to a respective key to which the value that is stored at the location corresponds, and that ii) for a second subset of locations in the data item associated with the new schema, identifies the respective key to which the value that is stored at the location corresponds by reference to the first schema,encoding, in accordance with the new schema, the first semi-structured data item in the first data format to generate a first new encoded data item by i) storing values corresponding to values from the first subset of the first key/value pairs at respective locations in the first new encoded data item, and by ii) storing values corresponding to values from the second subset of the key/value pairs in corresponding locations in the second subset of locations that are identified by the first schema,storing the first new encoded data item in the data item repository, andassociating the first new encoded data item with the new schema.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for storing semi-structured data. One of the methods includes maintaining a plurality of schemas; receiving a first semi-structured data item; determining that the first semi-structured data item does not match any of the schemas in the plurality of schemas; and in response to determining that the first semi-structured data item does not match any of the schemas in the plurality of schemas: generating a new schema, encoding the first semi-structured data item in the first data format to generate the first new encoded data item in accordance with the new schema, storing the first new encoded data item in the data item repository, and associating the first new encoded data item with the new schema.

15 Citations

View as Search Results

20 Claims

1. A method comprising:
- maintaining a plurality of schemas, wherein each schema is associated with one or more encoded data items stored in a first data format in a data item repository, wherein each encoded data item stores a respective value at each of one or more locations in the encoded data item, and wherein each schema maps each of the locations in the data items associated with the schema to a respective key to which the value stored at the location in the data items associated with the schema corresponds;
  
  receiving a first semi-structured data item, wherein the first semi-structured data item is in a semi-structured data format, and wherein the first semi-structured data item comprises one or more first key/value pairs;
  
  determining that i) a first subset of the first key/value pairs of the first semi-structured data item do not match any of the schemas in the plurality of schemas and that ii) a second subset of the first key/value pairs of the first semi-structured data item match a first schema of the plurality of schemas; and
  
  in response to determining that i) a first subset of the first key/value pairs of the first semi-structured data item do not match any of the schemas in the plurality of schemas and that ii) a second subset of the first key/value pairs of the first semi-structured data item match a first schema of the plurality of schemas;
  
  generating a new schema that i) for a first subset of locations in a data item associated with the new schema, maps the locations to a respective key to which the value that is stored at the location corresponds, and that ii) for a second subset of locations in the data item associated with the new schema, identifies the respective key to which the value that is stored at the location corresponds by reference to the first schema,encoding, in accordance with the new schema, the first semi-structured data item in the first data format to generate a first new encoded data item by i) storing values corresponding to values from the first subset of the first key/value pairs at respective locations in the first new encoded data item, and by ii) storing values corresponding to values from the second subset of the key/value pairs in corresponding locations in the second subset of locations that are identified by the first schema,storing the first new encoded data item in the data item repository, andassociating the first new encoded data item with the new schema.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, wherein determining that the first subset of the first key/value pairs of the first semi-structured data item do not match any of the schemas in the plurality of schemas comprises determining that the keys from the first key/value pairs do not match the keys mapped to by any of the plurality of schemas.
  - 3. The method of claim 1, wherein the first schema of the plurality of schemas maps each of the keys from the first subset of the first key/value pairs to locations and identifies requirements for values of one or more of the keys from the first subset of the first key/value pairs, and wherein determining that the first subset of the first key/value pairs of the first semi-structured data item do not match any of the schemas in the plurality of schemas comprises determining that the values from the first subset of the first key/value pairs do not satisfy the requirements identified in the first schema.
  - 4. The method of claim 1, further comprising:
    - receiving a second semi-structured data item, wherein the second semi-structured data item comprises one or more second key/value pairs;
      
      determining that the second semi-structured data item matches a second schema from the plurality of schemas; and
      
      in response to determining that the second semi-structured data item matches the second schema;
      
      encoding the second semi-structured data item in the first data format to generate a second new encoded data item by storing values corresponding to the values from the second key/value pairs at respective locations in the second new encoded data item in accordance with the second schema,storing the second new encoded data item in the data item repository, andassociating the second new encoded data item with the second schema.
  - 5. The method of claim 4, wherein determining that the second semi-structured data item matches the second schema from the plurality of schemas comprises determining that the keys mapped to locations by the second schema match the keys from the second key/value pairs.
  - 6. The method of claim 4, wherein the second schema identifies requirements for values of one or more of the keys mapped to locations by the second schema.
  - 7. The method of claim 6, wherein determining that the second semi-structured data item matches the second schema from the plurality of schemas comprises determining that the values from the second key/value pairs satisfy the requirements identified in the second schema.
  - 8. The method of claim 1, further comprising:
    - receiving a query for semi-structured data items, wherein the query specifies requirements for values for one or more keys;
      
      identifying schemas from the plurality of schemas that identify locations for values corresponding to each of the one or more keys;
      
      for each identified schema, searching the encoded data items associated with the schema to identify encoded data items that satisfy the query; and
      
      providing data identifying values from the encoded data items that satisfy the query in response to the query.
  - 9. The method of claim 8, wherein searching the encoded data items associated with the schema comprises:
    - searching, for each encoded data item associated with the schema, the locations in the encoded data item identified by the schema as storing values for the specified keys to identify whether the encoded data item stores values for the specified keys that satisfy the requirements specified in the query.

10. A system comprising one or more computers and one or more storage devices storing instructions that when executed by the one or more computers cause the one or more computers to perform operations comprising:
- maintaining a plurality of schemas, wherein each schema is associated with one or more encoded data items stored in a first data format in a data item repository, wherein each encoded data item stores a respective value at each of one or more locations in the encoded data item, and wherein each schema maps each of the locations in the data items associated with the schema to a respective key to which the value stored at the location in the data items associated with the schema corresponds;
  
  receiving a first semi-structured data item, wherein the first semi-structured data item is in a semi-structured data format, and wherein the first semi-structured data item comprises one or more first key/value pairs;
  
  determining that i) a first subset of the first key/value pairs of the first semi-structured data item do not match any of the schemas in the plurality of schemas and that ii) a second subset of the first key/value pairs of the first semi-structured data item match a first schema of the plurality of schemas; and
  
  in response to determining that i) a first subset of the first key/value pairs of the first semi-structured data item do not match any of the schemas in the plurality of schemas and that ii) a second subset of the first key/value pairs of the first semi-structured data item match a first schema of the plurality of schemas;
  
  generating a new schema that i) for a first subset of locations in a data item associated with the new schema, maps the locations to a respective key to which the value that is stored at the location corresponds, and that ii) for a second subset of locations in the data item associated with the new schema, identifies the respective key to which the value that is stored at the location corresponds by reference to the first schema,encoding, in accordance with the new schema, the first semi-structured data item in the first data format to generate a first new encoded data item by i) storing values corresponding to values from the first subset of the first key/value pairs at respective locations in the first new encoded data item, and by ii) storing values corresponding to values from the second subset of the key/value pairs in corresponding locations in the second subset of locations that are identified by the first schema,storing the first new encoded data item in the data item repository, andassociating the first new encoded data item with the new schema.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
- - 11. The system of claim 10, wherein determining that the first subset of the first key/value pairs of the first semi-structured data item do not match any of the schemas in the plurality of schemas comprises determining that the keys from the first key/value pairs do not match the keys mapped to by any of the plurality of schemas.
  - 12. The system of claim 10, wherein the first schema of the plurality of schemas maps each of the keys from the first subset of the first key/value pairs to locations and identifies requirements for values of one or more of the keys from the first subset of the first key/value pairs, and wherein determining that the first subset of the first key/value pairs of the first semi-structured data item do not match any of the schemas in the plurality of schemas comprises determining that the values from the first subset of the first key/value pairs do not satisfy the requirements identified in the first schema.
  - 13. The system of claim 10, the operations further comprising:
    - receiving a second semi-structured data item, wherein the second semi-structured data item comprises one or more second key/value pairs;
      
      determining that the second semi-structured data item matches a second schema from the plurality of schemas; and
      
      in response to determining that the second semi-structured data item matches the second schema;
      
      encoding the second semi-structured data item in the first data format to generate a second new encoded data item by storing values corresponding to the values from the second key/value pairs at respective locations in the second new encoded data item in accordance with the second schema,storing the second new encoded data item in the data item repository, andassociating the second new encoded data item with the second schema.
  - 14. The system of claim 13, wherein determining that the second semi-structured data item matches the second schema from the plurality of schemas comprises determining that the keys mapped to locations by the second schema match the keys from the second key/value pairs.
  - 15. The system of claim 13, wherein the second schema identifies requirements for values of one or more of the keys mapped to locations by the second schema.
  - 16. The system of claim 15, wherein determining that the second semi-structured data item matches the second schema from the plurality of schemas comprises determining that the values from the second key/value pairs satisfy the requirements identified in the second schema.
  - 17. The system of claim 10, further comprising:
    - receiving a query for semi-structured data items, wherein the query specifies requirements for values for one or more keys;
      
      identifying schemas from the plurality of schemas that identify locations for values corresponding to each of the one or more keys;
      
      for each identified schema, searching the encoded data items associated with the schema to identify encoded data items that satisfy the query; and
      
      providing data identifying values from the encoded data items that satisfy the query in response to the query.
  - 18. The system of claim 17, wherein searching the encoded data items associated with the schema comprises:
    - searching, for each encoded data item associated with the schema, the locations in the encoded data item identified by the schema as storing values for the specified keys to identify whether the encoded data item stores values for the specified keys that satisfy the requirements specified in the query.

19. A non-transitory computer storage medium encoded with a computer program, the computer program comprising instructions that when executed by one or more computers cause the one or more computers to perform operations comprising:
- maintaining a plurality of schemas, wherein each schema is associated with one or more encoded data items stored in a first data format in a data item repository, wherein each encoded data item stores a respective value at each of one or more locations in the encoded data item, and wherein each schema maps each of the locations in the data items associated with the schema to a respective key to which the value stored at the location in the data items associated with the schema corresponds;
  
  receiving a first semi-structured data item, wherein the first semi-structured data item is in a semi-structured data format, and wherein the first semi-structured data item comprises one or more first key/value pairs;
  
  determining that i) a first subset of the first key/value pairs of the first semi-structured data item do not match any of the schemas in the plurality of schemas and that ii) a second subset of the first key/value pairs of the first semi-structured data item match a first schema of the plurality of schemas; and
  
  in response to determining that i) a first subset of the first key/value pairs of the first semi-structured data item do not match any of the schemas in the plurality of schemas and that ii) a second subset of the first key/value pairs of the first semi-structured data item match a first schema of the plurality of schemas;
  
  generating a new schema that i) for a first subset of locations in a data item associated with the new schema, maps the locations to a respective key to which the value that is stored at the location corresponds, and that ii) for a second subset of locations in the data item associated with the new schema, identifies the respective key to which the value that is stored at the location corresponds by reference to the first schema,encoding, in accordance with the new schema, the first semi-structured data item in the first data format to generate a first new encoded data item by i) storing values corresponding to values from the first subset of the first key/value pairs at respective locations in the first new encoded data item, and by ii) storing values corresponding to values from the second subset of the key/value pairs in corresponding locations in the second subset of locations that are identified by the first schema,storing the first new encoded data item in the data item repository, andassociating the first new encoded data item with the new schema.
- View Dependent Claims (20)
- - 20. The non-transitory computer storage medium of claim 19, the operations further comprising:
    - receiving a second semi-structured data item, wherein the second semi-structured data item comprises one or more second key/value pairs;
      
      determining that the second semi-structured data item matches a second schema from the plurality of schemas; and
      
      in response to determining that the second semi-structured data item matches the second schema;
      
      encoding the second semi-structured data item in the first data format to generate a second new encoded data item by storing values corresponding to the values from the second key/value pairs at respective locations in the second new encoded data item in accordance with the second schema,storing the second new encoded data item in the data item repository, andassociating the second new encoded data item with the second schema.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Probst, Martin
Primary Examiner(s)
Beausoliel, Jr., Robert
Assistant Examiner(s)
Santos, Pedro J

Application Number

US14/507,690
Time in Patent Office

1,065 Days
Field of Search

7079991, 707E17005, 707E17044, 707E17006, 707756, 707999102, 707803, 707802, 707E17124, 707E17125, 707808, 707E17127
US Class Current
CPC Class Codes

G06F 16/213   with details for schema evo...

G06F 16/33   Querying

G06F 16/83   Querying

G06F 16/835   Query processing

G06F 16/86   Mapping to a database

Storing semi-structured data

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

15 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

Storing semi-structured data

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

15 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others