×

Method, apparatus, and computer-readable medium for optimized data subsetting

  • US 9,262,501 B2
  • Filed: 12/13/2012
  • Issued: 02/16/2016
  • Est. Priority Date: 12/13/2012
  • Status: Active Grant
First Claim
Patent Images

1. A method for data subsetting executed by one or more computing devices, the method comprising:

  • receiving, by at least one of the one or more computing devices, a request for a subset of data from a plurality of tables, the request comprising subset criteria;

    determining, by at least one of the one or more computing devices, whether an entity graph corresponding to the plurality of tables contains a cycle, the entity graph comprising;

    a plurality of entities, each entity representing a table in the plurality of tables;

    edge data corresponding to a plurality of edges between the entities, wherein the edge data identifies a parent entity and a child entity, and each edge runs from a child entity to a parent entity; and

    relationship data corresponding to a plurality of relationships between the entities, wherein the relationship data identifies whether the relationship between the parent entity and the child entity is a major relationship or a minor relationship;

    when the entity graph contains a cycle;

    performing, by at least one of the one or more computing devices, cyclic subset processing using the entity graph;

    when the entity graph does not contain a cycle, performing the steps of;

    expanding, by at least one of the one or more computing devices, the entity graph to generate an expanded entity graph;

    performing, by at least one of the one or more computing devices, cyclic subset processing using the expanded entity graph when the expanded entity graph contains a cycle; and

    performing, by at least one of the one or more computing devices, acyclic subset processing using the expanded entity graph when the expanded entity graph does not contain a cycle;

    wherein acyclic subset processing comprises;

    for every entity corresponding to a table that is to be subsetted, generating an entity-subset definition which defines a portion of the requested subset of data that is in the table represented by the entity, wherein the entity-subset definition comprises one or more expressions;

    generating a plan space based on the entity-subset definitions, the plan space comprising one or more operators and one or more intermediate expressions required to compute one or more final subset tables corresponding to the requested subset of data;

    expanding the plan space by generating one or more additional operators and one or more additional intermediate expressions that are equivalent to the one or more operators and the one or more intermediate expressions;

    selecting a group of operators in the plan space that can be used to calculate the one or more final subset tables; and

    calculating the one or more final subset tables using the selected group of operators.

View all claims
  • 9 Assignments
Timeline View
Assignment View
    ×
    ×