System and method for data quality analysis between untrusted parties
First Claim
1. A system for data quality analysis, comprising:
- memory storing a dataset comprising attributes each associated with one or more elements;
a client comprising;
a first vector generating module to generate an interest vector of separately encrypted values identifying elements of interest for at least one attribute;
a request module to send an encrypted request to a server, wherein the encrypted request comprises the interest vector; and
a first determination module to send an acquisition determination to the server, wherein the acquisition determination is based on a data quality value; and
the server, comprising;
a receipt module to receive the encrypted request from the client regarding data quality for the at least one attribute;
a second vector generating module to generate a condensed data vector representing the elements for the at least one attribute, wherein the condensed data vector is the same length as the interest vector;
a condensed data vector module to determine the condensed data vector as one of a counting hashmap when the data quality comprises data completeness and a histogram when the data quality comprises data validity, comprising at least one of;
a hashmap module to determine the condensed data vector as the counting hashmap, comprising;
a calculation module to calculate a hash value for each of the elements for the at least one attribute;
an occupancy determination module to determine a number of times each hash value occurs in the dataset as an occurrence value; and
placement module to place the occurrence values in an element of the vector indexed by the hash values; and
a histogram module to determine the condensed data vector as the histogram, comprising;
determination module to set a maximum and minimum value for the elements of the at least one attribute;
a graph module to generate the histogram based on the set maximum and minimum values for the elements along an x-axis and frequency occurrences of the elements along the y-axis; and
a placement module to place the frequency of occurrences along the condensed data vector;
an aggregator module to determine an aggregate of the elements of interest by determining for each of the elements in the condensed data vector, an encrypted product of that element and a corresponding element of the interest vector and by calculating the aggregate as an encrypted value by determining a total product of all the encrypted products, wherein the aggregate is used to assign the data quality value to the elements of the at least one attribute in the dataset; and
a provider module to provide the dataset based on the acquisition determination.
6 Assignments
0 Petitions
Accused Products
Abstract
A system and method for data quality analysis between untrusted parties is provided. A dataset having attributes each associated with one or more elements is maintained. An encrypted request is received from a client regarding data quality for one of the attributes. The encrypted request includes an interest vector of separately encrypted values identifying those elements of interest for the attribute. A condensed data vector representing the elements is generated for the attribute and is the same length as the interest vector. An aggregate of the elements of interest is determined by calculating for each element in the condensed data vector, an encrypted product of that element and a corresponding element of the interest vector and by determining a total product of all the encrypted products. A data quality value is assigned to the elements of the attribute in the dataset based on the aggregate.
5 Citations
16 Claims
-
1. A system for data quality analysis, comprising:
-
memory storing a dataset comprising attributes each associated with one or more elements; a client comprising; a first vector generating module to generate an interest vector of separately encrypted values identifying elements of interest for at least one attribute; a request module to send an encrypted request to a server, wherein the encrypted request comprises the interest vector; and a first determination module to send an acquisition determination to the server, wherein the acquisition determination is based on a data quality value; and the server, comprising; a receipt module to receive the encrypted request from the client regarding data quality for the at least one attribute; a second vector generating module to generate a condensed data vector representing the elements for the at least one attribute, wherein the condensed data vector is the same length as the interest vector; a condensed data vector module to determine the condensed data vector as one of a counting hashmap when the data quality comprises data completeness and a histogram when the data quality comprises data validity, comprising at least one of; a hashmap module to determine the condensed data vector as the counting hashmap, comprising; a calculation module to calculate a hash value for each of the elements for the at least one attribute; an occupancy determination module to determine a number of times each hash value occurs in the dataset as an occurrence value; and placement module to place the occurrence values in an element of the vector indexed by the hash values; and a histogram module to determine the condensed data vector as the histogram, comprising; determination module to set a maximum and minimum value for the elements of the at least one attribute; a graph module to generate the histogram based on the set maximum and minimum values for the elements along an x-axis and frequency occurrences of the elements along the y-axis; and a placement module to place the frequency of occurrences along the condensed data vector; an aggregator module to determine an aggregate of the elements of interest by determining for each of the elements in the condensed data vector, an encrypted product of that element and a corresponding element of the interest vector and by calculating the aggregate as an encrypted value by determining a total product of all the encrypted products, wherein the aggregate is used to assign the data quality value to the elements of the at least one attribute in the dataset; and a provider module to provide the dataset based on the acquisition determination. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method for data quality analysis, comprising:
-
maintaining, by a memory, a dataset comprising attributes each associated with one or more elements; generating, by a client an interest vector of separately encrypted values identifying elements of interest for at least one attribute; sending, from the client to a server, an encrypted request comprising the interest vector; receiving, by the server, the encrypted request from the client regarding data quality for the at least one attribute; generating, by the server, a condensed data vector representing the elements for the at least one attribute, wherein the condensed data vector is the same length as the interest vector; determining, by the server, the condensed data vector as one of a counting hashmap when the data quality comprises data completeness and a histogram when the data quality comprises data validity, comprising at least one of; determining the condensed vector as the hashmap, comprising; calculating a hash value for each of the elements for the at least one attribute; determining a number of times each hash value occurs in the dataset as an occurrence value; and placing the occurrence values in an element of the vector indexed by the hash values; and determining the condensed data vector as the histogram, comprising; setting a maximum and minimum value for the elements of the at least one attribute; generating the histogram based on the set maximum and minimum values for the elements along an x-axis and frequency occurrences of the elements along the y-axis; and placing the frequency of occurrences along the condensed data vector; determining, by the server, an aggregate of the elements of interest, comprising; determining for each of the elements in the condensed data vector, an encrypted product of that element and a corresponding element of the interest vector; and calculating the aggregate as an encrypted value by determining a total product of all the encrypted products; and assigning, by the server, a data quality value to the elements of the at least one attribute in the dataset based on the aggregate; sending an acquisition determination to the server, wherein the acquisition determination is based on the data quality value; and providing the dataset based on the acquisition determination. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
Specification