Storing semi-structured data
First Claim
1. A method comprising:
- maintaining a plurality of schemas, wherein each schema is associated with one or more encoded data items stored in a first data format in a data item repository, wherein each encoded data item stores a respective value at each of one or more locations in the encoded data item, and wherein each schema maps each of the locations in the data items associated with the schema to a respective key to which the value stored at the location in the data items associated with the schema corresponds;
receiving a first semi-structured data item, wherein the first semi-structured data item is in a semi-structured data format, and wherein the first semi-structured data item comprises one or more first key/value pairs;
determining that i) a first subset of the first key/value pairs of the first semi-structured data item do not match any of the schemas in the plurality of schemas and that ii) a second subset of the first key/value pairs of the first semi-structured data item match a first schema of the plurality of schemas; and
in response to determining that i) a first subset of the first key/value pairs of the first semi-structured data item do not match any of the schemas in the plurality of schemas and that ii) a second subset of the first key/value pairs of the first semi-structured data item match a first schema of the plurality of schemas;
generating a new schema that i) for a first subset of locations in a data item associated with the new schema, maps the locations to a respective key to which the value that is stored at the location corresponds, and that ii) for a second subset of locations in the data item associated with the new schema, identifies the respective key to which the value that is stored at the location corresponds by reference to the first schema,encoding, in accordance with the new schema, the first semi-structured data item in the first data format to generate a first new encoded data item by i) storing values corresponding to values from the first subset of the first key/value pairs at respective locations in the first new encoded data item, and by ii) storing values corresponding to values from the second subset of the key/value pairs in corresponding locations in the second subset of locations that are identified by the first schema,storing the first new encoded data item in the data item repository, andassociating the first new encoded data item with the new schema.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for storing semi-structured data. One of the methods includes maintaining a plurality of schemas; receiving a first semi-structured data item; determining that the first semi-structured data item does not match any of the schemas in the plurality of schemas; and in response to determining that the first semi-structured data item does not match any of the schemas in the plurality of schemas: generating a new schema, encoding the first semi-structured data item in the first data format to generate the first new encoded data item in accordance with the new schema, storing the first new encoded data item in the data item repository, and associating the first new encoded data item with the new schema.
15 Citations
20 Claims
-
1. A method comprising:
-
maintaining a plurality of schemas, wherein each schema is associated with one or more encoded data items stored in a first data format in a data item repository, wherein each encoded data item stores a respective value at each of one or more locations in the encoded data item, and wherein each schema maps each of the locations in the data items associated with the schema to a respective key to which the value stored at the location in the data items associated with the schema corresponds; receiving a first semi-structured data item, wherein the first semi-structured data item is in a semi-structured data format, and wherein the first semi-structured data item comprises one or more first key/value pairs; determining that i) a first subset of the first key/value pairs of the first semi-structured data item do not match any of the schemas in the plurality of schemas and that ii) a second subset of the first key/value pairs of the first semi-structured data item match a first schema of the plurality of schemas; and in response to determining that i) a first subset of the first key/value pairs of the first semi-structured data item do not match any of the schemas in the plurality of schemas and that ii) a second subset of the first key/value pairs of the first semi-structured data item match a first schema of the plurality of schemas; generating a new schema that i) for a first subset of locations in a data item associated with the new schema, maps the locations to a respective key to which the value that is stored at the location corresponds, and that ii) for a second subset of locations in the data item associated with the new schema, identifies the respective key to which the value that is stored at the location corresponds by reference to the first schema, encoding, in accordance with the new schema, the first semi-structured data item in the first data format to generate a first new encoded data item by i) storing values corresponding to values from the first subset of the first key/value pairs at respective locations in the first new encoded data item, and by ii) storing values corresponding to values from the second subset of the key/value pairs in corresponding locations in the second subset of locations that are identified by the first schema, storing the first new encoded data item in the data item repository, and associating the first new encoded data item with the new schema. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system comprising one or more computers and one or more storage devices storing instructions that when executed by the one or more computers cause the one or more computers to perform operations comprising:
-
maintaining a plurality of schemas, wherein each schema is associated with one or more encoded data items stored in a first data format in a data item repository, wherein each encoded data item stores a respective value at each of one or more locations in the encoded data item, and wherein each schema maps each of the locations in the data items associated with the schema to a respective key to which the value stored at the location in the data items associated with the schema corresponds; receiving a first semi-structured data item, wherein the first semi-structured data item is in a semi-structured data format, and wherein the first semi-structured data item comprises one or more first key/value pairs; determining that i) a first subset of the first key/value pairs of the first semi-structured data item do not match any of the schemas in the plurality of schemas and that ii) a second subset of the first key/value pairs of the first semi-structured data item match a first schema of the plurality of schemas; and in response to determining that i) a first subset of the first key/value pairs of the first semi-structured data item do not match any of the schemas in the plurality of schemas and that ii) a second subset of the first key/value pairs of the first semi-structured data item match a first schema of the plurality of schemas; generating a new schema that i) for a first subset of locations in a data item associated with the new schema, maps the locations to a respective key to which the value that is stored at the location corresponds, and that ii) for a second subset of locations in the data item associated with the new schema, identifies the respective key to which the value that is stored at the location corresponds by reference to the first schema, encoding, in accordance with the new schema, the first semi-structured data item in the first data format to generate a first new encoded data item by i) storing values corresponding to values from the first subset of the first key/value pairs at respective locations in the first new encoded data item, and by ii) storing values corresponding to values from the second subset of the key/value pairs in corresponding locations in the second subset of locations that are identified by the first schema, storing the first new encoded data item in the data item repository, and associating the first new encoded data item with the new schema. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A non-transitory computer storage medium encoded with a computer program, the computer program comprising instructions that when executed by one or more computers cause the one or more computers to perform operations comprising:
maintaining a plurality of schemas, wherein each schema is associated with one or more encoded data items stored in a first data format in a data item repository, wherein each encoded data item stores a respective value at each of one or more locations in the encoded data item, and wherein each schema maps each of the locations in the data items associated with the schema to a respective key to which the value stored at the location in the data items associated with the schema corresponds; receiving a first semi-structured data item, wherein the first semi-structured data item is in a semi-structured data format, and wherein the first semi-structured data item comprises one or more first key/value pairs; determining that i) a first subset of the first key/value pairs of the first semi-structured data item do not match any of the schemas in the plurality of schemas and that ii) a second subset of the first key/value pairs of the first semi-structured data item match a first schema of the plurality of schemas; and in response to determining that i) a first subset of the first key/value pairs of the first semi-structured data item do not match any of the schemas in the plurality of schemas and that ii) a second subset of the first key/value pairs of the first semi-structured data item match a first schema of the plurality of schemas; generating a new schema that i) for a first subset of locations in a data item associated with the new schema, maps the locations to a respective key to which the value that is stored at the location corresponds, and that ii) for a second subset of locations in the data item associated with the new schema, identifies the respective key to which the value that is stored at the location corresponds by reference to the first schema, encoding, in accordance with the new schema, the first semi-structured data item in the first data format to generate a first new encoded data item by i) storing values corresponding to values from the first subset of the first key/value pairs at respective locations in the first new encoded data item, and by ii) storing values corresponding to values from the second subset of the key/value pairs in corresponding locations in the second subset of locations that are identified by the first schema, storing the first new encoded data item in the data item repository, and associating the first new encoded data item with the new schema. - View Dependent Claims (20)
Specification