Resolution of data flow errors using the lineage of detected error conditions
First Claim
1. A method of resolving error conditions in a data flow, comprising:
- at a computer having a display, one or more processors, and memory storing one or more programs configured for execution by the one or more processors;
displaying a user interface that includes a flow diagram having a plurality of nodes, each node specifying a respective operation and having a respective intermediate data set;
receiving user specification of a validation rule for a first node of the plurality of nodes in the flow diagram, wherein the validation rule specifies a condition that applies to a first intermediate data set corresponding to the first node;
determining that the first intermediate data set violates the validation rule;
in response to determining that the first intermediate data set violates the validation rule;
identifying one or more errors corresponding to rows in the first intermediate data set;
displaying an error resolution user interface that provides information about the one or more errors, wherein the error resolution user interface includes a plurality of interlinked regions, including;
a natural language summary region providing a synopsis of the one or more errors, the synopsis including a number of errors identified, one or more error types, and a number of errors for each of the one or more error types;
an error profile region graphically depicting the one or more errors, including, for each respective error type, a respective visual mark that depicts a respective number of errors for the respective error type;
a data flow trace region providing lineage of the one or more errors according to the flow diagram, the lineage including;
(i) a visual representation of at least a subset of the plurality of nodes, (ii) a visual representation for each respective operation associated with each of the plurality of nodes, and (iii) a graphic depiction of errors, if any, at each represented node; and
a data region displaying data for a subset of columns from the first intermediate data set; and
determining a proposed solution for at least some of the one or more errors based, at least in part, on data values in the first intermediate data set, wherein the proposed solution includes deleting a row of data from the first intermediate data set corresponding to a respective error of the one or more errors.
1 Assignment
0 Petitions
Accused Products
Abstract
A method enables users to resolve errors in a data flow. The method displays a user interface (UI) that includes a flow diagram having a plurality of nodes, receives user specification of a validation rule for a first node of the plurality of nodes in the flow diagram, and determines that an intermediate data set violates the validation rule. In response to determining that the first intermediate data set violates the validation rule, the method identifies errors corresponding to rows in the intermediate data set, and displays an error resolution UI that provides information about the errors. The error resolution UI includes a natural language summary region providing a synopsis of the errors, an error profile region graphically depicting the errors, a data flow trace region providing lineage of the errors in the flow, and a data region displaying data for a subset of columns from the intermediate data set.
61 Citations
20 Claims
-
1. A method of resolving error conditions in a data flow, comprising:
at a computer having a display, one or more processors, and memory storing one or more programs configured for execution by the one or more processors; displaying a user interface that includes a flow diagram having a plurality of nodes, each node specifying a respective operation and having a respective intermediate data set; receiving user specification of a validation rule for a first node of the plurality of nodes in the flow diagram, wherein the validation rule specifies a condition that applies to a first intermediate data set corresponding to the first node; determining that the first intermediate data set violates the validation rule; in response to determining that the first intermediate data set violates the validation rule; identifying one or more errors corresponding to rows in the first intermediate data set; displaying an error resolution user interface that provides information about the one or more errors, wherein the error resolution user interface includes a plurality of interlinked regions, including; a natural language summary region providing a synopsis of the one or more errors, the synopsis including a number of errors identified, one or more error types, and a number of errors for each of the one or more error types; an error profile region graphically depicting the one or more errors, including, for each respective error type, a respective visual mark that depicts a respective number of errors for the respective error type; a data flow trace region providing lineage of the one or more errors according to the flow diagram, the lineage including;
(i) a visual representation of at least a subset of the plurality of nodes, (ii) a visual representation for each respective operation associated with each of the plurality of nodes, and (iii) a graphic depiction of errors, if any, at each represented node; anda data region displaying data for a subset of columns from the first intermediate data set; and determining a proposed solution for at least some of the one or more errors based, at least in part, on data values in the first intermediate data set, wherein the proposed solution includes deleting a row of data from the first intermediate data set corresponding to a respective error of the one or more errors. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
17. A non-transitory computer readable storage medium storing one or more programs configured for execution by a computer system having one or more processors, memory, and a display, the one or more programs comprising instructions for:
-
displaying a user interface that includes a flow diagram having a plurality of nodes, each node specifying a respective operation and having a respective intermediate data set; receiving user specification of a validation rule for a first node of the plurality of nodes in the flow diagram, wherein the validation rule specifies a condition that applies to a first intermediate data set corresponding to the first node; determining that the first intermediate data set violates the validation rule; in response to determining that the first intermediate data set violates the validation rule; identifying one or more errors corresponding to rows in the first intermediate data set; displaying an error resolution user interface that provides information about the one or more errors, wherein the error resolution user interface includes a plurality of interlinked regions, including; a natural language summary region providing a synopsis of the one or more errors, the synopsis including a number of errors identified, one or more error types, and a number of errors for each of the one or more error types; an error profile region graphically depicting the one or more errors, including, for each respective error type, a respective visual mark that depicts a respective number of errors for the respective error type; a data flow trace region providing lineage of the one or more errors according to the flow diagram, the lineage including;
(i) a visual representation of at least a subset of the plurality of nodes, (ii) a visual representation for each respective operation associated with each of the plurality of nodes, and (iii) a graphic depiction of errors, if any, at each represented node; anda data region displaying data for a subset of columns from the first intermediate data set; and determining a proposed solution for at least some of the one or more errors based, at least in part, on data values in the first intermediate data set, wherein the proposed solution includes deleting a row of data from the first intermediate data set corresponding to a respective error of the one or more errors.
-
-
18. A computer system for identifying errors in a data set, comprising:
-
one or more processors; memory; and one or more programs stored in the memory and configured for execution by the one or more processors, the one or more programs comprising instructions for; displaying a user interface that includes a flow diagram having a plurality of nodes, each node specifying a respective operation and having a respective intermediate data set; receiving user specification of a validation rule for a first node of the plurality of nodes in the flow diagram, wherein the validation rule specifies a condition that applies to a first intermediate data set corresponding to the first node; determining that the first intermediate data set violates the validation rule; in response to determining that the first intermediate data set violates the validation rule; identifying one or more errors corresponding to rows in the first intermediate data set; displaying an error resolution user interface that provides information about the one or more errors, wherein the error resolution user interface includes a plurality of interlinked regions, including; a natural language summary region providing a synopsis of the one or more errors, the synopsis including a number of errors identified, one or more error types, and a number of errors for each of the one or more error types; an error profile region graphically depicting the one or more errors, including, for each respective error type, a respective visual mark that depicts a respective number of errors for the respective error type; a data flow trace region providing lineage of the one or more errors according to the flow diagram, the lineage including;
(i) a visual representation of at least a subset of the plurality of nodes, (ii) a visual representation for each respective operation associated with each of the plurality of nodes, and (iii) a graphic depiction of errors, if any, at each represented node; anda data region displaying data for a subset of columns from the first intermediate data set; and determining a proposed solution for at least some of the one or more errors based, at least in part, on data values in the first intermediate data set, wherein the proposed solution includes deleting a row of data from the first intermediate data set corresponding to a respective error of the one or more errors. - View Dependent Claims (19, 20)
-
Specification