System and method for organizing data
First Claim
1. A method for identifying duplicate data between a first field vector and a second field vector comprising:
- sorting the first field vector in a particular order;
sorting the second field vector in said particular order;
comparing a first value at a first index in the first field vector with a second value at a second index in the second field vector;
if said first value is not equal to said second value, adjusting either said first index or said second index based on a difference between said first value and said second value; and
if said first value is equal to said second value, determining said first and second values as duplicate data.
5 Assignments
0 Petitions
Accused Products
Abstract
A system and method for organizing raw data from one or more sources uses an improved mechanism for identifying duplicate data between fields (e.g., columns) in the databases. The fields may be similar fields within a single database or similar or identical fields within a pair of databases and as organized as arrays or field vectors. The present invention sorts each of the field vectors and if necessary, partitions them by common value. A number of comparisons required to identify the duplicate data between the field vectors is reduced by feeding back a difference between the compared values. This difference is used to adjust indices into the field vectors for subsequent comparison.
-
Citations
20 Claims
-
1. A method for identifying duplicate data between a first field vector and a second field vector comprising:
-
sorting the first field vector in a particular order;
sorting the second field vector in said particular order;
comparing a first value at a first index in the first field vector with a second value at a second index in the second field vector;
if said first value is not equal to said second value, adjusting either said first index or said second index based on a difference between said first value and said second value; and
if said first value is equal to said second value, determining said first and second values as duplicate data. - View Dependent Claims (2, 3, 4, 5, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
6. A method for identifying duplicate data between a first field vector and a second field vector comprising:
-
sorting the first field vector in a particular order;
sorting the second field vector in said particular order;
comparing a first value at a first index in the first field vector with a second value at a second index in the second field vector;
if said first value is not equal to said second value, adjusting one of said first index and said second index based on a difference between said first value and said second value; and
if said first value is equal to said second value, determining said first and second values as duplicate data, wherein said sorting the first field vector in a particular order comprises sorting the first field vector in an increasing order, and wherein said sorting the second field vector in said particular order comprises sorting the second field vector in said increasing order, and wherein said adjusting one of said first index and said second index comprises;
adjusting said first index if said first value is less than said second value, and adjusting said second index if said second value is less than said first value.
-
-
15. A method for identifying duplicate data between a first field vector and a second field vector, the first field vector and the second field vector sorted in a particular order, the method comprising:
-
partitioning said first field vector into sets of common values;
partitioning said second field vector into sets common values;
comparing a first value in a first position in the first field vector with a second value at a second position in the second field vector;
if said first value is not equal to said second value, adjusting either said first position or said second position based on a difference between said first value and said second value; and
if said first value is equal to said second value, determining said first and second values as duplicate data. - View Dependent Claims (16, 17, 18)
-
-
19. A method for sorting data comprising:
-
receiving a value to be sorted;
determining a first position in a vector where said value is to be included;
retrieving a vector value from said vector at said first position;
feeding back said vector value to determine a difference between said value and said vector value; and
determining a new position in said vector based at least in part on said difference. - View Dependent Claims (20)
-
Specification