Method, apparatus, and computer-readable medium for optimized data subsetting
First Claim
Patent Images
1. A method for data subsetting executed by one or more computing devices, the method comprising:
- receiving, by at least one of the one or more computing devices, a request for a subset of data from a plurality of tables, the request comprising subset criteria;
determining, by at least one of the one or more computing devices, whether an entity graph corresponding to the plurality of tables contains a cycle, the entity graph comprising;
a plurality of entities, each entity representing a table in the plurality of tables;
edge data corresponding to a plurality of edges between the entities, wherein the edge data identifies a parent entity and a child entity, and each edge runs from a child entity to a parent entity; and
relationship data corresponding to a plurality of relationships between the entities, wherein the relationship data identifies whether the relationship between the parent entity and the child entity is a major relationship or a minor relationship;
when the entity graph contains a cycle;
performing, by at least one of the one or more computing devices, cyclic subset processing using the entity graph;
when the entity graph does not contain a cycle, performing the steps of;
expanding, by at least one of the one or more computing devices, the entity graph to generate an expanded entity graph;
performing, by at least one of the one or more computing devices, cyclic subset processing using the expanded entity graph when the expanded entity graph contains a cycle; and
performing, by at least one of the one or more computing devices, acyclic subset processing using the expanded entity graph when the expanded entity graph does not contain a cycle;
wherein acyclic subset processing comprises;
for every entity corresponding to a table that is to be subsetted, generating an entity-subset definition which defines a portion of the requested subset of data that is in the table represented by the entity, wherein the entity-subset definition comprises one or more expressions;
generating a plan space based on the entity-subset definitions, the plan space comprising one or more operators and one or more intermediate expressions required to compute one or more final subset tables corresponding to the requested subset of data;
expanding the plan space by generating one or more additional operators and one or more additional intermediate expressions that are equivalent to the one or more operators and the one or more intermediate expressions;
selecting a group of operators in the plan space that can be used to calculate the one or more final subset tables; and
calculating the one or more final subset tables using the selected group of operators.
9 Assignments
0 Petitions
Accused Products
Abstract
An apparatus, computer-readable medium, and computer-implemented method for data subsetting, including receiving a request for a subset of data from a plurality of tables, determining whether an entity graph corresponding the plurality of tables contains a cycle, and if so, performing cyclic subset processing, otherwise, expanding the entity graph and performing acyclic subset processing if the expanded entity graph does not have any cycles and cyclic subset processing if the expanded entity graph does have cycles.
8 Citations
12 Claims
-
1. A method for data subsetting executed by one or more computing devices, the method comprising:
-
receiving, by at least one of the one or more computing devices, a request for a subset of data from a plurality of tables, the request comprising subset criteria; determining, by at least one of the one or more computing devices, whether an entity graph corresponding to the plurality of tables contains a cycle, the entity graph comprising; a plurality of entities, each entity representing a table in the plurality of tables; edge data corresponding to a plurality of edges between the entities, wherein the edge data identifies a parent entity and a child entity, and each edge runs from a child entity to a parent entity; and relationship data corresponding to a plurality of relationships between the entities, wherein the relationship data identifies whether the relationship between the parent entity and the child entity is a major relationship or a minor relationship; when the entity graph contains a cycle; performing, by at least one of the one or more computing devices, cyclic subset processing using the entity graph; when the entity graph does not contain a cycle, performing the steps of; expanding, by at least one of the one or more computing devices, the entity graph to generate an expanded entity graph; performing, by at least one of the one or more computing devices, cyclic subset processing using the expanded entity graph when the expanded entity graph contains a cycle; and performing, by at least one of the one or more computing devices, acyclic subset processing using the expanded entity graph when the expanded entity graph does not contain a cycle; wherein acyclic subset processing comprises; for every entity corresponding to a table that is to be subsetted, generating an entity-subset definition which defines a portion of the requested subset of data that is in the table represented by the entity, wherein the entity-subset definition comprises one or more expressions; generating a plan space based on the entity-subset definitions, the plan space comprising one or more operators and one or more intermediate expressions required to compute one or more final subset tables corresponding to the requested subset of data; expanding the plan space by generating one or more additional operators and one or more additional intermediate expressions that are equivalent to the one or more operators and the one or more intermediate expressions; selecting a group of operators in the plan space that can be used to calculate the one or more final subset tables; and calculating the one or more final subset tables using the selected group of operators. - View Dependent Claims (2, 4)
-
-
3. An apparatus for data subsetting, the apparatus comprising:
-
one or more processors; and one or more memories operatively coupled to at least one of the one or more processors and having instructions stored thereon that, when executed by at least one of the one or more processors, cause at least one of the one or more processors to; receive a request for a subset of data from a plurality of tables, the request comprising subset criteria; determine whether an entity graph corresponding to the plurality of tables contains a cycle, the entity graph comprising; a plurality of entities, each entity representing a table in the plurality of tables; edge data corresponding to a plurality of edges between the entities, wherein the edge data identifies a parent entity and a child entity, and each edge runs from a child entity to a parent entity; and relationship data corresponding to a plurality of relationships between the entities, wherein the relationship data identifies whether the relationship between the parent entity and the child entity is a major relationship or a minor relationship; when the entity graph contains a cycle; perform cyclic subset processing using the entity graph; when the entity graph does not contain a cycle, perform the steps of; expanding the entity graph to generate an expanded entity graph; performing subset processing using the expanded entity graph when the expanded entity graph contains a cycle; and performing acyclic subset processing using the expanded entity graph when the expanded entity graph does not contain a cycle; wherein acyclic subset processing comprises; for every entity corresponding to a table that is to be subsetted, generating an entity-subset definition which defines a portion of the requested subset of data that is in the table represented by the entity, wherein the entity-subset definition comprises one or more expressions; generating a plan space based on the entity-subset definitions, the plan space comprising one or more operators and one or more intermediate expressions required to compute one or more final subset tables corresponding to the requested subset of data; expanding the plan space by generating one or more additional operators and one or more additional intermediate expressions that are equivalent to the one or more operators and the one or more intermediate expressions; selecting a group of operators in the plan space that can be used to calculate the one or more final subset tables; and calculating the one or more final subset tables using the selected group of operators.
-
-
5. At least one non-transitory computer-readable medium storing computer-readable instructions that, when executed by one or more computing devices, cause at least one of the one or more computing devices to:
-
receive a request for a subset of data from a plurality of tables, the request comprising subset criteria; determine whether an entity graph corresponding to the plurality of tables contains a cycle, the entity graph comprising; a plurality of entities, each entity representing a table in the plurality of tables; edge data corresponding to a plurality of edges between the entities, wherein the edge data identifies a parent entity and a child entity, and each edge runs from a child entity to a parent entity; and relationship data corresponding to a plurality of relationships between the entities, wherein the relationship data identifies whether the relationship between the parent entity and the child entity is a major relationship or a minor relationship; when the entity graph contains a cycle; perform cyclic subset processing using the entity graph; when the entity graph does not contain a cycle, perform the steps of; expanding the entity graph to generate an expanded entity graph; performing subset processing using the expanded entity graph when the expanded entity graph contains a cycle; and performing acyclic subset processing using the expanded entity graph when the expanded entity graph does not contain a cycle; wherein acyclic subset processing comprises; for every entity corresponding to a table that is to be subsetted, generating an entity-subset definition which defines a portion of the requested subset of data that is in the table represented by the entity, wherein the entity-subset definition comprises one or more expressions; generating a plan space based on the entity-subset definitions, the plan space comprising one or more operators and one or more intermediate expressions required to compute one or more final subset tables corresponding to the requested subset of data; expanding the plan space by generating one or more additional operators and one or more additional intermediate expressions that are equivalent to the one or more operators and the one or more intermediate expressions; selecting a group of operators in the plan space that can be used to calculate the one or more final subset tables; and calculating the one or more final subset tables using the selected group of operators. - View Dependent Claims (6)
-
-
7. A method for data subsetting executed by one or more computing devices, the method comprising:
-
receiving, by at least one of the one or more computing devices, a request for a subset of data from a plurality of tables, the request comprising subset criteria; determining, by at least one of the one or more computing devices, whether an entity graph corresponding to the plurality of tables contains a cycle, the entity graph comprising; a plurality of entities, each entity representing a table in the plurality of tables; edge data corresponding to a plurality of edges between the entities, wherein the edge data identifies a parent entity and a child entity, and each edge runs from a child entity to a parent entity; and relationship data corresponding to a plurality of relationships between the entities, wherein the relationship data identifies whether the relationship between the parent entity and the child entity is a major relationship or a minor relationship; when the entity graph contains a cycle; performing, by at least one of the one or more computing devices, cyclic subset processing using the entity graph; when the entity graph does not contain a cycle, performing the steps of; expanding, by at least one of the one or more computing devices, the entity graph to generate an expanded entity graph, wherein expanding the entity graph comprises; for every major relationship in the entity graph; creating a duplicate child entity; creating an edge from the parent entity to the duplicate child entity; removing every outgoing edge from the child entity other than an edge between the child entity and the parent entity in that relationship; and adding every outgoing edge removed from the child entity as an outgoing edge from the duplicate child entity; performing, by at least one of the one or more computing devices, cyclic subset processing using the expanded entity graph when the expanded entity graph contains a cycle; and performing, by at least one of the one or more computing devices, acyclic subset processing using the expanded entity graph when the expanded entity graph does not contain a cycle.
-
-
8. A method for data subsetting executed by one or more computing devices, the method comprising:
-
receiving, by at least one of the one or more computing devices, a request for a subset of data from a plurality of tables, the request comprising subset criteria; determining, by at least one of the one or more computing devices, whether an entity graph corresponding to the plurality of tables contains a cycle, the entity graph comprising; a plurality of entities, each entity representing a table in the plurality of tables; edge data corresponding to a plurality of edges between the entities, wherein the edge data identifies a parent entity and a child entity, and each edge runs from a child entity to a parent entity; and relationship data corresponding to a plurality of relationships between the entities, wherein the relationship data identifies whether the relationship between the parent entity and the child entity is a major relationship or a minor relationship; when the entity graph contains a cycle; performing, by at least one of the one or more computing devices, cyclic subset processing using the entity graph; when the entity graph does not contain a cycle, performing the steps of; expanding, by at least one of the one or more computing devices, the entity graph to generate an expanded entity graph; performing, by at least one of the one or more computing devices, cyclic subset processing using the expanded entity graph when the expanded entity graph contains a cycle; and performing, by at least one of the one or more computing devices, acyclic subset processing using the expanded entity graph when the expanded entity graph does not contain a cycle; wherein cyclic subset processing comprises; for every entity corresponding to a table that is to be subsetted, generating an entity-subset definition which defines a portion of the requested subset of data that is in the table, wherein the entity-subset definition comprises one or more expressions; for every entity corresponding to a table that is to be subsetted, generating a staging table from the table, wherein the staging table comprises only the columns of data required to evaluate the one or more expressions in the entity-subset definition for that entity and a column of flag values representing whether a particular row is to be included in the final subset, wherein each flag value is initially set to true if the data values in that row fulfill the subset criteria and set to false otherwise; recursively evaluating every expression in each entity-subset definition and changing the flag values to true for each of the rows which satisfy the expression in each of the staging tables; and generating one or more final subset tables by compiling all of the rows from the plurality of tables that correspond to rows in the staging tables which have a flag value of true.
-
-
9. An apparatus for data subsetting, the apparatus comprising:
-
one or more processors; and one or more memories operatively coupled to at least one of the one or more processors and having instructions stored thereon that, when executed by at least one of the one or more processors, cause at least one of the one or more processors to; receive a request for a subset of data from a plurality of tables, the request comprising subset criteria; determine whether an entity graph corresponding to the plurality of tables contains a cycle, the entity graph comprising; a plurality of entities, each entity representing a table in the plurality of tables; edge data corresponding to a plurality of edges between the entities, wherein the edge data identifies a parent entity and a child entity, and each edge runs from a child entity to a parent entity; and relationship data corresponding to a plurality of relationships between the entities, wherein the relationship data identifies whether the relationship between the parent entity and the child entity is a major relationship or a minor relationship; when the entity graph contains a cycle; perform cyclic subset processing using the entity graph; when the entity graph does not contain a cycle, performing the steps of; expanding the entity graph to generate an expanded entity graph, wherein expanding the entity graph comprises; for every major relationship in the entity graph;
creating a duplicate child entity;
creating an edge from the parent entity to the duplicate child entity;
removing every outgoing edge from the child entity other than an edge between the child entity and the parent entity in that relationship; and
adding every outgoing edge removed from the child entity as an outgoing edge from the duplicate child entity;performing cyclic subset processing using the expanded entity graph when the expanded entity graph contains a cycle; and performing acyclic subset processing using the expanded entity graph when the expanded entity graph does not contain a cycle.
-
-
10. An apparatus for data subsetting, the apparatus comprising:
-
one or more processors; and one or more memories operatively coupled to at least one of the one or more processors and having instructions stored thereon that, when executed by at least one of the one or more processors, cause at least one of the one or more processors to; receive a request for a subset of data from a plurality of tables, the request comprising subset criteria; determine whether an entity graph corresponding to the plurality of tables contains a cycle, the entity graph comprising; a plurality of entities, each entity representing a table in the plurality of tables; edge data corresponding to a plurality of edges between the entities, wherein the edge data identifies a parent entity and a child entity, and each edge runs from a child entity to a parent entity; and relationship data corresponding to a plurality of relationships between the entities, wherein the relationship data identifies whether the relationship between the parent entity and the child entity is a major relationship or a minor relationship; when the entity graph contains a cycle; perform cyclic subset processing using the entity graph; when the entity graph does not contain a cycle, performing the steps of; expanding the entity graph to generate an expanded entity graph; performing cyclic subset processing using the expanded entity graph when the expanded entity graph contains a cycle; and performing acyclic subset processing using the expanded entity graph when the expanded entity graph does not contain a cycle; wherein cyclic subset processing comprises; for every entity corresponding to a table that is to be subsetted, generating an entity-subset definition which defines a portion of the requested subset of data that is in the table, wherein the entity-subset definition comprises one or more expressions; for every entity corresponding to a table that is to be subsetted, generating a staging table from the table, wherein the staging table comprises only the columns of data required to evaluate the one or more expressions in the entity-subset definition for that entity and a column of flag values representing whether a particular row is to be included in the final subset, wherein each flag value is initially set to true if the data values in that row fulfill the subset criteria and set to false otherwise; recursively evaluating every expression in each entity-subset definition and changing the flag values to true for each of the rows which satisfy the expression in each of the staging tables; and generating one or more final subset tables by compiling all of the rows from the plurality of tables that correspond to rows in the staging tables which have a flag value of true.
-
-
11. At least one non-transitory computer-readable medium storing computer-readable instructions that, when executed by one or more computing devices, cause at least one of the one or more computing devices to:
-
receive a request for a subset of data from a plurality of tables, the request comprising subset criteria; determine whether an entity graph corresponding to the plurality of tables contains a cycle, the entity graph comprising; a plurality of entities, each entity representing a table in the plurality of tables; edge data corresponding to a plurality of edges between the entities, wherein the edge data identifies a parent entity and a child entity, and each edge runs from a child entity to a parent entity; and relationship data corresponding to a plurality of relationships between the entities, wherein the relationship data identifies whether the relationship between the parent entity and the child entity is a major relationship or a minor relationship; when the entity graph contains a cycle; perform cyclic subset processing using the entity graph; when the entity graph does not contain a cycle, performing the steps of; expanding the entity graph to generate an expanded entity graph, wherein expanding the entity graph comprises; for every major relationship in the entity graph; creating a duplicate child entity; creating an edge from the parent entity to the duplicate child entity; removing every outgoing edge from the child entity other than an edge between the child entity and the parent entity in that relationship; and adding every outgoing edge removed from the child entity as an outgoing edge from the duplicate child entity; performing cyclic subset processing using the expanded entity graph when the expanded entity graph contains a cycle; and performing acyclic subset processing using the expanded entity graph when the expanded entity graph does not contain a cycle.
-
-
12. At least one non-transitory computer-readable medium storing computer-readable instructions that, when executed by one or more computing devices, cause at least one of the one or more computing devices to:
-
receive a request for a subset of data from a plurality of tables, the request comprising subset criteria; determine whether an entity graph corresponding to the plurality of tables contains a cycle, the entity graph comprising; a plurality of entities, each entity representing a table in the plurality of tables; edge data corresponding to a plurality of edges between the entities, wherein the edge data identifies a parent entity and a child entity, and each edge runs from a child entity to a parent entity; and relationship data corresponding to a plurality of relationships between the entities, wherein the relationship data identifies whether the relationship between the parent entity and the child entity is a major relationship or a minor relationship; when the entity graph contains a cycle; perform cyclic subset processing using the entity graph; when the entity graph does not contain a cycle, performing the steps of; expanding the entity graph to generate an expanded entity graph; performing cyclic subset processing using the expanded entity graph when the expanded entity graph contains a cycle; and performing acyclic subset processing using the expanded entity graph when the expanded entity graph does not contain a cycle; wherein cyclic subset processing comprises; for every entity corresponding to a table that is to be subsetted, generating an entity-subset definition which defines a portion of the requested subset of data that is in the table, wherein the entity-subset definition comprises one or more expressions; for every entity corresponding to a table that is to be subsetted, generating a staging table from the table, wherein the staging table comprises only the columns of data required to evaluate the one or more expressions in the entity-subset definition for that entity and a column of flag values representing whether a particular row is to be included in the final subset, wherein each flag value is initially set to true if the data values in that row fulfill the subset criteria and set to false otherwise; recursively evaluating every expression in each entity-subset definition and changing the flag values to true for each of the rows which satisfy the expression in each of the staging tables; and generating one or more final subset tables by compiling all of the rows from the plurality of tables that correspond to rows in the staging tables which have a flag value of true.
-
Specification