×

Resolution of data flow errors using the lineage of detected error conditions

  • US 10,394,691 B1
  • Filed: 10/05/2017
  • Issued: 08/27/2019
  • Est. Priority Date: 10/05/2017
  • Status: Active Grant
First Claim
Patent Images

1. A method of resolving error conditions in a data flow, comprising:

  • at a computer having a display, one or more processors, and memory storing one or more programs configured for execution by the one or more processors;

    displaying a user interface that includes a flow diagram having a plurality of nodes, each node specifying a respective operation and having a respective intermediate data set;

    receiving user specification of a validation rule for a first node of the plurality of nodes in the flow diagram, wherein the validation rule specifies a condition that applies to a first intermediate data set corresponding to the first node;

    determining that the first intermediate data set violates the validation rule;

    in response to determining that the first intermediate data set violates the validation rule;

    identifying one or more errors corresponding to rows in the first intermediate data set;

    displaying an error resolution user interface that provides information about the one or more errors, wherein the error resolution user interface includes a plurality of interlinked regions, including;

    a natural language summary region providing a synopsis of the one or more errors, the synopsis including a number of errors identified, one or more error types, and a number of errors for each of the one or more error types;

    an error profile region graphically depicting the one or more errors, including, for each respective error type, a respective visual mark that depicts a respective number of errors for the respective error type;

    a data flow trace region providing lineage of the one or more errors according to the flow diagram, the lineage including;

    (i) a visual representation of at least a subset of the plurality of nodes, (ii) a visual representation for each respective operation associated with each of the plurality of nodes, and (iii) a graphic depiction of errors, if any, at each represented node; and

    a data region displaying data for a subset of columns from the first intermediate data set; and

    determining a proposed solution for at least some of the one or more errors based, at least in part, on data values in the first intermediate data set, wherein the proposed solution includes deleting a row of data from the first intermediate data set corresponding to a respective error of the one or more errors.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×