×

Intelligent data curation

  • US 10,552,739 B1
  • Filed: 07/05/2019
  • Issued: 02/04/2020
  • Est. Priority Date: 10/15/2018
  • Status: Active Grant
First Claim
Patent Images

1. An apparatus comprising a processor and a storage to store instructions that, when executed by the processor, cause the processor to perform operations comprising:

  • receive an indication of availability of a data set, wherein the indication of availability of the data set comprises an indication of at least one of a contextual aspect of the data set from among a pre-selected set of contextual aspects, a structural feature of the data set from among a pre-selected set of structural features, or a data feature of a data value of the data set from among a pre-selected set of data features;

    coordinate a performance of a distribution of data set portions of the data set among a set of processor cores;

    provide a set of feature routines to each processor core of the set of processor cores to enable each processor core of the set of processor cores to execute instructions of each feature routine to detect a structural feature of the corresponding data set portion or a data feature of data values of the corresponding data set portion;

    receive indications of detected structural features and detected data features of the set of data set portions from the set of processor cores;

    generate, for the data set, metadata indicative of the detected structural features and the detected data features;

    generate, for the data set, context data indicative of the set of contextual aspects of the data set;

    provide the metadata and context data to each processor core of the set of processor cores;

    distribute a set of suggestion models among the set of processor cores to provide each processor core of the set of processor cores with a different suggestion model from among the set of suggestion models, and to enable the set of processor cores to employ the set of suggestion models to derive a suggested subset of data preparation operations of a set of data preparation operations to be suggested to be performed on the data set, wherein;

    each suggestion model comprises a pre-selected type of model previously trained to determine whether to suggest that a corresponding data preparation operation of the set of data preparation operations be performed on the data set based on the metadata and context data;

    receive indications of the suggested subset from the set of processor cores;

    transmit an indication of the suggested subset to a viewing device to enable a presentation of the suggested subset;

    receive, from the viewing device, an indication of a selected subset of the set of data preparation operations selected to be performed;

    compare the selected subset to the suggested subset to determine whether there is a difference between the suggested and selected subsets; and

    in response to a determination that there is a difference between the suggested and selected subsets, re-train at least one suggestion model of the set of suggestion models based at least on a combination of the metadata, the context data and the selected subset.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×