×

Data profiling

  • US 9,323,802 B2
  • Filed: 10/20/2014
  • Issued: 04/26/2016
  • Est. Priority Date: 09/15/2003
  • Status: Active Grant
First Claim
Patent Images

1. A method for processing data including:

  • reading data records from a data source;

    profiling the data records, including;

    sending the data records to a partitioning component, the partitioning component partitioning the data records into a plurality of partitions;

    generating, in each partition, a plurality of census elements for each data record, each census element including;

    a field of the data; and

    a corresponding value occurring within the field of the data record;

    in each partition, combining occurrences of census elements having the same value for the same field into an output census element including the field, the value, and a count of the number of combined census elements;

    partitioning the output census elements by the field and the value included in each output census element, wherein output census elements that have the same value for the same field are partitioned into the same partition; and

    adding counts of the number of occurrences of the same value for the same field for the partitioned output census elements to produce, for each field and corresponding value, a single census element that includes a total count of occurrences of that field and corresponding value in the data records;

    storing profile information; and

    processing the data records, including;

    accessing the stored profile information;

    reading the data records from the data source;

    processing the data records according to the profile information; and

    outputting a result of the processing.

View all claims
  • 4 Assignments
Timeline View
Assignment View
    ×
    ×