Efficient identification of entire row uniqueness in relational databases
First Claim
1. A method for efficiently identifying uniqueness of rows of a relational database, the method comprising:
- a processor creating a cryptographic sum for each row of one or more rows of a target table of the relational database, wherein the cryptographic sum for a particular row of the one or more rows of the target table is calculated by summing the contents of a selected subset of columns from among all columns in that particular row of the target table and assigning a unique checksum value based on the summed contents of the selected subset of columns in that particular row;
receiving an incoming record;
selecting a next row of a plurality of rows of the incoming record;
the processor determining if the next row contains an incoming cryptographic sum, wherein the incoming cryptographic sum of the next row is calculated by summing contents of a selected subset of columns from among all columns in the next row of the incoming record and assigning a unique checksum value based on the summed contents of the selected subset of columns in the next row, wherein the selected subset of columns comprises a first column containing a medical record and a second column containing a social security number;
in response to determining that the next row contains the incoming cryptographic sum;
comparing the incoming cryptographic sum of the next row to the cryptographic sum of each row of the one or more rows of the target table;
separating the cryptographic sum into a plurality of equally sized blocks; and
the processor appending the plurality of equally sized blocks of the cryptographic sum of the one or more rows of the target table to a hidden column of the target tablein response to determining the incoming cryptographic sum of the next row is identical to at least one cryptographic sum of the one or more rows of the target table, the processor disregarding the next row when updating the target table;
in response to determining the incoming cryptographic sum of the next row is not identical to at least one cryptographic sum of the one or more rows of the target table, determining if the next row contains an incoming record ID, wherein the incoming record ID is an identification value of the next row;
in response to determining that the next row contains the incoming record ID, identifying the incoming record ID for the next row;
comparing the incoming record ID of the next row with a record ID of the one or more rows of the target table; and
in response to determining the incoming record ID is identical to at least one record ID of at least one row the one or more rows of the target table, the processor updating contents of the at least one row with contents of the next row;
in response to determining the incoming record ID is not identical to at least one record ID of at least one row the one or more rows of the target table, and the incoming cryptographic sum of the next row is not identical to at least one cryptographic sum of the one or more rows of the target table, the processor adding the next row as a new row within the target table via a logical instruction; and
in response to determining the next row does not contain the incoming record ID, the processor adding the next row as a new row within the target table via a logical instruction; and
in response to determining that the next row does not contain the incoming cryptographic sum, the processor;
calculating the incoming cryptographic sum for the next row;
separating the incoming cryptographic sum into a plurality of equally sized blocks; and
storing the plurality of equally sized blocks of the incoming cryptographic sum for the next row in a hidden column of the next row; and
iteratively performing, until no additional rows remain in the plurality of rows of the incoming record, the functions of;
determining if the next row contains an incoming cryptographic sum, comparing the incoming cryptographic sum of the next row to the cryptographic sum of each row of the one or more rows of the target table, and in response to determining the incoming cryptographic sum of the next row is identical to at least one cryptographic sum of the one or more rows of the target table, disregarding the next row when updating the target table.
1 Assignment
0 Petitions
Accused Products
Abstract
A method, system, and computer program product for efficiently comparing multiple columns of a row of a relational database to an incoming record. A computer creates a cryptographic sum for columns of a row of the relational database. The cryptographic sum is stored as a hidden column in the relational database. Logic may compare the cryptographic sum with an incoming cryptographic sum of entries in an incoming record. Logic may then determine if the incoming cryptographic sums differ from the corresponding cryptographic sums of rows of data of the relational database. When the two cryptographic sums are identical, the data of the incoming record is disregarded as an identical record that already exists. An entry of the incoming record may be added to the target table or updated within an existing record of the relational database when the cryptographic sum and the incoming cryptographic sum of that entry differ.
-
Citations
3 Claims
-
1. A method for efficiently identifying uniqueness of rows of a relational database, the method comprising:
-
a processor creating a cryptographic sum for each row of one or more rows of a target table of the relational database, wherein the cryptographic sum for a particular row of the one or more rows of the target table is calculated by summing the contents of a selected subset of columns from among all columns in that particular row of the target table and assigning a unique checksum value based on the summed contents of the selected subset of columns in that particular row; receiving an incoming record; selecting a next row of a plurality of rows of the incoming record; the processor determining if the next row contains an incoming cryptographic sum, wherein the incoming cryptographic sum of the next row is calculated by summing contents of a selected subset of columns from among all columns in the next row of the incoming record and assigning a unique checksum value based on the summed contents of the selected subset of columns in the next row, wherein the selected subset of columns comprises a first column containing a medical record and a second column containing a social security number; in response to determining that the next row contains the incoming cryptographic sum; comparing the incoming cryptographic sum of the next row to the cryptographic sum of each row of the one or more rows of the target table; separating the cryptographic sum into a plurality of equally sized blocks; and the processor appending the plurality of equally sized blocks of the cryptographic sum of the one or more rows of the target table to a hidden column of the target table in response to determining the incoming cryptographic sum of the next row is identical to at least one cryptographic sum of the one or more rows of the target table, the processor disregarding the next row when updating the target table; in response to determining the incoming cryptographic sum of the next row is not identical to at least one cryptographic sum of the one or more rows of the target table, determining if the next row contains an incoming record ID, wherein the incoming record ID is an identification value of the next row; in response to determining that the next row contains the incoming record ID, identifying the incoming record ID for the next row; comparing the incoming record ID of the next row with a record ID of the one or more rows of the target table; and in response to determining the incoming record ID is identical to at least one record ID of at least one row the one or more rows of the target table, the processor updating contents of the at least one row with contents of the next row; in response to determining the incoming record ID is not identical to at least one record ID of at least one row the one or more rows of the target table, and the incoming cryptographic sum of the next row is not identical to at least one cryptographic sum of the one or more rows of the target table, the processor adding the next row as a new row within the target table via a logical instruction; and
in response to determining the next row does not contain the incoming record ID, the processor adding the next row as a new row within the target table via a logical instruction; andin response to determining that the next row does not contain the incoming cryptographic sum, the processor; calculating the incoming cryptographic sum for the next row; separating the incoming cryptographic sum into a plurality of equally sized blocks; and storing the plurality of equally sized blocks of the incoming cryptographic sum for the next row in a hidden column of the next row; and iteratively performing, until no additional rows remain in the plurality of rows of the incoming record, the functions of;
determining if the next row contains an incoming cryptographic sum, comparing the incoming cryptographic sum of the next row to the cryptographic sum of each row of the one or more rows of the target table, and in response to determining the incoming cryptographic sum of the next row is identical to at least one cryptographic sum of the one or more rows of the target table, disregarding the next row when updating the target table.
-
-
2. A computer comprising:
-
a processor; a memory coupled to the processor; processing logic executing on the processor that; creates a cryptographic sum for each row of one or more rows of a target table of the relational database, wherein the cryptographic sum for a particular row of the one or more rows of the target table is calculated by summing the contents of a selected subset of columns from among all columns in that particular row of the target table and assigning a unique checksum value based on the summed contents of the selected subset of columns in that particular row; receives an incoming record; selects a next row of a plurality of rows of the incoming record; determines if the next row contains an incoming cryptographic sum, wherein the incoming cryptographic sum of the next row is calculated by summing contents of a selected subset of columns from among all columns in the next row of the incoming record and assigning a unique checksum value based on the summed contents of the selected subset of columns in the next row; in response to determining that the next row contains the incoming cryptographic sum; compares the incoming cryptographic sum of the next row to the cryptographic sum of each row of the one or more rows of the target table; separates the cryptographic sum into a plurality of equally sized blocks; and appends the plurality of equally sized blocks of the cryptographic sum of the one or more rows of the target table to a hidden column of the target table; in response to determining the incoming cryptographic sum of the next row is identical to at least one cryptographic sum of the one or more rows of the target table, disregards the next row when updating the target table; in response to determining the incoming cryptographic sum of the next row is not identical to at least one cryptographic sum of the one or more rows of the target table, determines if the next row contains an incoming record ID, wherein the incoming record ID is an identification value of the next row; in response to determining that the next row contains the incoming record ID, identifies the incoming record ID for the next row; compares the incoming record ID of the next row a record ID of the one or more rows of the target table; in response to determining the incoming record ID is identical to at least one record ID of at least one row the one or more rows of the target table, updates contents of the-at least one row with contents of the next row; in response to determining the incoming record ID is not identical to at least one record ID of at least one row the one or more rows of the target table, and the incoming cryptographic sum of the next row is not identical to at least one cryptographic sum of the one or more rows of the target table, adds the next row as a new row of the target table via a logical instruction; in response to determining the next row does not contain the incoming record ID, adds the next row as a new row within the target table via a logical instruction; in response to determining that the next row does not contain the incoming cryptographic sum; calculates the incoming cryptographic sum for the next row; separates the incoming cryptographic sum into a plurality of equally sized blocks; and stores the plurality of equally sized blocks of the incoming cryptographic sum for the next row in a hidden column of the next row; and iteratively performs, until no additional rows remain in the plurality of rows of the incoming record, the functions of determines if the next row contains an incoming cryptographic sum, compares the incoming cryptographic sum of the next row to the cryptographic sum of each row of the one or more rows of the target table, and in response to determining the incoming cryptographic sum of the next row is identical to the cryptographic sum of the similarly identified row, disregards the next row when updating the target table; wherein the selected subset of columns comprises a first column containing a medical record and a second column containing a social security number.
-
-
3. A storage device having a plurality of instructions embodied therein, wherein the plurality of instructions, when executed by a processing device, allows a machine to:
-
create a cryptographic sum for each row of one or more rows of a target table of the relational database, wherein the cryptographic sum for a particular row of the one or more rows of the target table is calculated by summing the contents of a selected subset of columns from among all columns in that particular row of the target table and assigning a unique checksum value based on the summed contents of the selected subset of columns in that particular row; receive an incoming record; select a next row of a plurality of rows of the incoming record; determine if the next row contains an incoming cryptographic sum, wherein the incoming cryptographic sum of the next row is calculated by summing contents of a selected subset of columns from among all columns in the next row of the incoming record and assigning a unique checksum value based on the summed contents of the selected subset of columns in the next row; in response to determining that the next row contains the incoming cryptographic sum; compare the incoming cryptographic sum of the next row to the cryptographic sum of each row of the one or more rows of the target table; separate the cryptographic sum into a plurality of equally sized blocks; and append the plurality of equally sized blocks of the cryptographic sum of the one or more rows of the target table to a hidden column of the target table; in response to determining the incoming cryptographic sum of the next row is identical to at least one cryptographic sum of the one or more rows of the target table, disregard the next row when updating the target table; in response to determining the incoming cryptographic sum of the next row is not identical to at least one cryptographic sum of the one or more rows of the target table, determine if the next row contains an incoming record ID, wherein the incoming record ID is an identification value of the next row; in response to determining that the next row contains the incoming record ID, identify the incoming record ID for the next row; compare the incoming record ID of the next row with a record ID of the one or more rows of the target table; in response to determining the incoming record ID is identical to at least one record ID of at least one row the one or more rows of the target table, update contents of the at least one row with contents of the next row; in response to determining the incoming record ID is not identical to at least one record ID of at least one row the one or more rows of the target table, and the incoming cryptographic sum of the next row is not identical to at least one cryptographic sum of the one or more rows of the target table, add the next row as a new row within the target table via a logical instruction; and in response to determining the next row does not contain the incoming record ID, add the next row as a new row within the target table via a logical instruction; in response to determining that the next row does not contain the incoming cryptographic sum; calculate the incoming cryptographic sum for the next row; separate the incoming cryptographic sum into a plurality of equally sized blocks; and store the plurality of equally sized blocks of the incoming cryptographic sum for the next row in a hidden column of the next row; and iteratively perform, until no additional rows remain in the plurality of rows of the incoming record, the functions of determine if the next row contains an incoming cryptographic sum, compare the incoming cryptographic sum of the next row to the cryptographic sum of each row of the one or more rows of the target table, and in response to determining the incoming cryptographic sum of the next row is identical to the cryptographic sum of the similarly identified row, disregard the next row when updating the target table; wherein the selected subset of columns comprises a first column containing a medical record and a second column containing a social security number.
-
Specification