System and method for organizing data

US 20030037051A1
Filed: 08/16/2002
Published: 02/20/2003
Est. Priority Date: 07/20/1999
Status: Abandoned Application

First Claim

Patent Images

1. A method for identifying duplicate data between a first field vector and a second field vector comprising:

sorting the first field vector in a particular order;

sorting the second field vector in said particular order;

comparing a first value at a first index in the first field vector with a second value at a second index in the second field vector;

if said first value is not equal to said second value, adjusting either said first index or said second index based on a difference between said first value and said second value; and

if said first value is equal to said second value, determining said first and second values as duplicate data.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for organizing raw data from one or more sources uses an improved mechanism for identifying duplicate data between fields (e.g., columns) in the databases. The fields may be similar fields within a single database or similar or identical fields within a pair of databases and as organized as arrays or field vectors. The present invention sorts each of the field vectors and if necessary, partitions them by common value. A number of comparisons required to identify the duplicate data between the field vectors is reduced by feeding back a difference between the compared values. This difference is used to adjust indices into the field vectors for subsequent comparison.

Citations

20 Claims

1. A method for identifying duplicate data between a first field vector and a second field vector comprising:
- sorting the first field vector in a particular order;
  
  sorting the second field vector in said particular order;
  
  comparing a first value at a first index in the first field vector with a second value at a second index in the second field vector;
  
  if said first value is not equal to said second value, adjusting either said first index or said second index based on a difference between said first value and said second value; and
  
  if said first value is equal to said second value, determining said first and second values as duplicate data.
- View Dependent Claims (2, 3, 4, 5, 7, 8, 9, 10, 11, 12, 13, 14)
- - 2. The method of claim 1, wherein said sorting the first field vector in a particular order comprises sorting the first field vector in an increasing order, and wherein said sorting the second field vector in said particular order comprises sorting the second field vector in said increasing order.
  - 3. The method of claim 1, wherein said sorting the first field vector in a particular order comprises sorting the first field vector in a decreasing order, and wherein said sorting the second field vector in said particular order comprises sorting the second field vector in said decreasing order.
  - 4. The method of claim 2, wherein said adjusting either said first index or said second index comprises adjusting said first index if said first value is less than said second value.
  - 5. The method of claim 2, wherein said adjusting either said first index or said second index comprises adjusting said second index if said second value is less than said first value.
  - 7. The method of claim 2, wherein said adjusting either said first index or said second index comprises incrementing either said first index or said second index based on whether said first value is greater than said second value.
  - 8. The method of claim 3, wherein said adjusting either said first index or said second index comprises decrementing either said first index or said second index based on whether said first value is greater than said second value.
  - 9. The method of claim 1, wherein said first value is a numeric value, and wherein said second value is a numeric value.
  - 10. The method of claim 9, wherein said first value is a numeric value that represents an alphanumeric value, and wherein said second value is a numeric value that represents an alphanumeric value.
  - 11. The method of claim 1, further comprising:
    - partioning said first field vector into at least one set of common values; and
      
      partioning said second field vector into at least one set of common values.
  - 12. The method of claim 11, wherein said adjusting either said first index or said second index comprises adjusting either said first index or said second index to a next partitioned set in a respective one of said first field or said second field vector.
  - 13. The method of claim 2, wherein said adjusting either said first index or said second index comprises:
    - adjusting said first index if said first value is less than said second value; and
      
      adjusting said second index if said second value is less than said first value.
  - 14. The method of claim 3, wherein said adjusting either said first index or said second index comprises:
    - adjusting said first index if said first value is greater than said second value; and
      
      adjusting said second index if said second value is greater than said first value.

6. A method for identifying duplicate data between a first field vector and a second field vector comprising:
- sorting the first field vector in a particular order;
  
  sorting the second field vector in said particular order;
  
  comparing a first value at a first index in the first field vector with a second value at a second index in the second field vector;
  
  if said first value is not equal to said second value, adjusting one of said first index and said second index based on a difference between said first value and said second value; and
  
  if said first value is equal to said second value, determining said first and second values as duplicate data, wherein said sorting the first field vector in a particular order comprises sorting the first field vector in an increasing order, and wherein said sorting the second field vector in said particular order comprises sorting the second field vector in said increasing order, and wherein said adjusting one of said first index and said second index comprises;
  
  adjusting said first index if said first value is less than said second value, and adjusting said second index if said second value is less than said first value.

15. A method for identifying duplicate data between a first field vector and a second field vector, the first field vector and the second field vector sorted in a particular order, the method comprising:
- partitioning said first field vector into sets of common values;
  
  partitioning said second field vector into sets common values;
  
  comparing a first value in a first position in the first field vector with a second value at a second position in the second field vector;
  
  if said first value is not equal to said second value, adjusting either said first position or said second position based on a difference between said first value and said second value; and
  
  if said first value is equal to said second value, determining said first and second values as duplicate data.
- View Dependent Claims (16, 17, 18)
- - 16. The method of claim 15, wherein said adjusting either said first position or said second position comprises adjusting either said first position or said second position to a next partitioned set of a respective one of said first field vector or said second field vector.
  - 17. The method of claim 16, wherein the first and second field vectors are sorted in increasing numeric order and wherein said adjusting either said first position or said second position comprises:
    - adjusting said first position to a next partitioned set in said first field vector if said first value is less than said second value; and
      
      adjusting said second position to a next partitioned set in said second field vector if said second value is less than said first value.
  - 18. The method of claim 16, wherein the first and second field vectors are sorted in decreasing numeric order and wherein said adjusting either said first position or said second position comprises:
    - adjusting said first position to a next partitioned set in said first field vector if said first value is greater than said second value; and
      
      adjusting said second position to a next partitioned set in said second field vector if said second value is greater than said first value.

19. A method for sorting data comprising:
- receiving a value to be sorted;
  
  determining a first position in a vector where said value is to be included;
  
  retrieving a vector value from said vector at said first position;
  
  feeding back said vector value to determine a difference between said value and said vector value; and
  
  determining a new position in said vector based at least in part on said difference.
- View Dependent Claims (20)
- - 20. The method of claim 19, wherein said determining a new position comprises determining a new position in said vector based at least in part on said first position.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Becket & Lee LLC
Original Assignee
Becket & Lee LLC
Inventors
Gruenwald, Bjorn J.

Application Number

US10/219,658
Publication Number

US 20030037051A1
Time in Patent Office

Days
Field of Search
US Class Current

707/7
CPC Class Codes

G06F 16/2365   Ensuring data consistency a...

G06F 16/258   Data format conversion from...

G06F 16/30   of unstructured textual dat...

G06F 16/33   Querying

Y10S 707/99933   Query processing, i.e. sear...

Y10S 707/99937   Sorting

Y10S 707/99942   Manipulating data structure...

Y10S 707/99952   Coherency, e.g. same view t...

System and method for organizing data

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for organizing data

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links