×

Profiling data with source tracking

  • US 10,719,511 B2
  • Filed: 02/13/2017
  • Issued: 07/21/2020
  • Est. Priority Date: 10/22/2012
  • Status: Active Grant
First Claim
Patent Images

1. A method for profiling data stored in a data storage system, the method including:

  • accessing multiple collections of records stored in the data storage system over an interface coupled to the data storage system;

    determining quantitative information for each of the multiple collections of records, the quantitative information for each particular collection including, for at least one selected field of the records in the particular collection, a corresponding list of value count entries, with each value count entry including a value appearing in at least the selected field and a count of the number of records in which the value appears in at least the selected field; and

    processing the quantitative information of two or more of the collections to generate profiling summary information, the processing including;

    merging the value count entries of corresponding lists for at least one field from each of at least a first collection and a second collection of the two or more collections to generate a combined list of value count entries, andaggregating value count entries of the combined list of value count entries to generate a list of distinct field value entries, at least some of the distinct field value entries identifying a distinct value from at least one of the value count entries;

    wherein each value count entry in a list of value count entries corresponding to a particular collection further includes location information identifying respective locations of records within the particular collection of records in which the value appears in at least the selected field.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×