×

Object metadata query with distributed processing systems

  • US 10,318,491 B1
  • Filed: 03/31/2015
  • Issued: 06/11/2019
  • Est. Priority Date: 03/31/2015
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method comprising:

  • providing one or more computer processors configured to perform;

    providing access, from a distributed processing system, to a distributed object store configured for storing and retrieving object data and associated metadata, where the distributed object store uniquely identifies objects contained therein using an object key comprising one or more namespace identifiers and a unique object identifier (id) within the identified namespace, wherein the namespace identifiers comprise a tenant id uniquely identifying a tenant within the distributed object store, and a bucket id uniquely identifying a bucket that comprises a plurality of objects, the bucket defined by and belonging to the tenant, wherein the distributed object store is configured as part of a distributed key-value store, the distributed key-value store comprising;

    a set of data and object metadata for the plurality of objects;

    a primary index configured for storing a mapping of object ids to storage locations, for the plurality of objects; and

    one or more secondary indexes each configured for storing information relating to properties of the plurality of objects different than the object ids and different than the set of data and metadata for the plurality of objects, wherein the one or more secondary indexes are each defined as part of a respective specified bucket, wherein each of the one or more secondary indexes comprises;

    information based on the properties of the object metadata itself and information based on object access patterns for one or more applications that query the distributed object store;

    a secondary index table maintaining a mapping between properties of the object metadata and cached object metadata properties for each stored object; and

    a plurality of secondary index definitions, the secondary index definitions comprising information about one or more secondary indexes that have been created in the distributed key-value store, the information about the one or more secondary indexes comprising, for each respective secondary index, index name, indexed metadata keys, and cached metadata keys, the cached metadata keys corresponding to duplicates of one or more of the cached object metadata properties;

    wherein providing duplicates of one or more of the cached object metadata properties is configured to improve data request performance by reducing time to access information about object metadata properties; and

    wherein at least one of the one or more secondary indexes is configured to improve the efficiency of responding to data requests;

    receiving a data request for object metadata from the distributed processing system, the data request associated with a first bucket within the distributed object store, wherein the first bucket comprises at least a first respective secondary index, the data request comprising an object metadata query,identifying one or more objects within the first bucket that satisfy the object metadata query;

    wherein the object metadata query includes at least one query predicate involving an object metadata key, wherein identifying the one or more objects within the first bucket that satisfy the object metadata query comprises;

    parsing the object metadata query into a query parse tree;

    generating a plurality of candidate query plans, each of the candidate query plans being semantically equivalent to the object metadata query;

    selecting one of the candidate query plans; and

    identifying one or more objects that satisfy the query predicate by retrieving object ids from the first respective secondary index using a first bucket id associated with the first bucket and the object metadata key involved in the query predicate;

    for each object identified as satisfying the object metadata query;

    determining a location of corresponding object metadata stored within the distributed object store;

    retrieving the corresponding object metadata using the determined location; and

    generating a metadata record from the corresponding object metadata;

    combining the metadata records from the identified objects into a metadata collection having a format compatible with the distributed processing system; and

    returning the metadata collection to the distributed processing system in connection with the response to the data request.

View all claims
  • 9 Assignments
Timeline View
Assignment View
    ×
    ×