×

System and method for data quality analysis between untrusted parties

  • US 9,413,760 B2
  • Filed: 09/05/2014
  • Issued: 08/09/2016
  • Est. Priority Date: 09/05/2014
  • Status: Active Grant
First Claim
Patent Images

1. A system for data quality analysis, comprising:

  • memory storing a dataset comprising attributes each associated with one or more elements;

    a client comprising;

    a first vector generating module to generate an interest vector of separately encrypted values identifying elements of interest for at least one attribute;

    a request module to send an encrypted request to a server, wherein the encrypted request comprises the interest vector; and

    a first determination module to send an acquisition determination to the server, wherein the acquisition determination is based on a data quality value; and

    the server, comprising;

    a receipt module to receive the encrypted request from the client regarding data quality for the at least one attribute;

    a second vector generating module to generate a condensed data vector representing the elements for the at least one attribute, wherein the condensed data vector is the same length as the interest vector;

    a condensed data vector module to determine the condensed data vector as one of a counting hashmap when the data quality comprises data completeness and a histogram when the data quality comprises data validity, comprising at least one of;

    a hashmap module to determine the condensed data vector as the counting hashmap, comprising;

    a calculation module to calculate a hash value for each of the elements for the at least one attribute;

    an occupancy determination module to determine a number of times each hash value occurs in the dataset as an occurrence value; and

    placement module to place the occurrence values in an element of the vector indexed by the hash values; and

    a histogram module to determine the condensed data vector as the histogram, comprising;

    determination module to set a maximum and minimum value for the elements of the at least one attribute;

    a graph module to generate the histogram based on the set maximum and minimum values for the elements along an x-axis and frequency occurrences of the elements along the y-axis; and

    a placement module to place the frequency of occurrences along the condensed data vector;

    an aggregator module to determine an aggregate of the elements of interest by determining for each of the elements in the condensed data vector, an encrypted product of that element and a corresponding element of the interest vector and by calculating the aggregate as an encrypted value by determining a total product of all the encrypted products, wherein the aggregate is used to assign the data quality value to the elements of the at least one attribute in the dataset; and

    a provider module to provide the dataset based on the acquisition determination.

View all claims
  • 6 Assignments
Timeline View
Assignment View
    ×
    ×