Fast database matching

US 9,846,739 B2
Filed: 11/14/2011
Issued: 12/19/2017
Est. Priority Date: 10/23/2006
Status: Active Grant

First Claim

Patent Images

1. A method of identifying a possible match between a sample data record and any in a plurality of enrolled data records in a data base, each enrolled data record comprising a first plurality of data positions, the method comprising:

a) prior to initiating a process for matching a sample data record separate from the data base with one of the enrolled data records in the data base, without first associating the sample data record with one of the enrolled data records, designating a second plurality of reference positions among the first plurality of data positions in a first enrolled data record, some of which reference positions in said first enrolled data record are separated by other data positions in said first enrolled data record, each reference position corresponding to a location in said first enrolled data record at which a key value, useful as a characteristic feature for identifying said first enrolled data record, is positioned, there being a first key value at a first enrolled data record reference position and a second key value at a second enrolled data record reference position, the totality of said key values providing an identification for distinguishing said first enrolled data record from others in the plurality of enrolled data records;

b) providing, for at least said first enrolled data record, an enrollment mask comprising a series of enrollment mask data positions, each corresponding to one in the first plurality of data positions in said first enrolled data record, the enrollment mask including at least first and second enrollment mask reference positions corresponding to first and second enrolled data record reference positions, wherein the first key value is associated with said first enrollment mask reference position and the second key value is associated with said second enrollment mask reference position to match a sample record with said first enrolled data record;

c) for the sample data record, defining a sample mask comprising sample mask data positions, each corresponding to a data position in said first enrolled data record, including first and second sample mask reference positions corresponding to said first and second enrollment mask reference positions and corresponding to the first and second reference positions in the enrolled data record reference positions;

d) associating said first key value with said first sample mask reference position and associating said second key value with said second mask reference position to identify in the sample record presence of at least said first and second key values at positions corresponding to reference positions in said first enrolled data record that are associated with said first and second key values; and

e) applying the sample mask to the sample data record to determine whether the first and second key values are at positions in the sample data record corresponding to the first and second sample mask reference positions to identify a possible match between the sample data record and said first enrolled data record.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method of improving the speed with which a sample data record can be matched against records in a database comprises defining a list of possible key values (430), testing those key values against the sample and, for each record in the database, counting the number of key values that match both the record and the sample at reference positions selected by a mask. A list of possible matches is then selected on the basis of that count, for more detailed matching or analysis. Such a method provides very fast matching at the expense of some additional effort when registering a new record within the database.

Citations

18 Claims

1. A method of identifying a possible match between a sample data record and any in a plurality of enrolled data records in a data base, each enrolled data record comprising a first plurality of data positions, the method comprising:
- a) prior to initiating a process for matching a sample data record separate from the data base with one of the enrolled data records in the data base, without first associating the sample data record with one of the enrolled data records, designating a second plurality of reference positions among the first plurality of data positions in a first enrolled data record, some of which reference positions in said first enrolled data record are separated by other data positions in said first enrolled data record, each reference position corresponding to a location in said first enrolled data record at which a key value, useful as a characteristic feature for identifying said first enrolled data record, is positioned, there being a first key value at a first enrolled data record reference position and a second key value at a second enrolled data record reference position, the totality of said key values providing an identification for distinguishing said first enrolled data record from others in the plurality of enrolled data records;
  
  b) providing, for at least said first enrolled data record, an enrollment mask comprising a series of enrollment mask data positions, each corresponding to one in the first plurality of data positions in said first enrolled data record, the enrollment mask including at least first and second enrollment mask reference positions corresponding to first and second enrolled data record reference positions, wherein the first key value is associated with said first enrollment mask reference position and the second key value is associated with said second enrollment mask reference position to match a sample record with said first enrolled data record;
  
  c) for the sample data record, defining a sample mask comprising sample mask data positions, each corresponding to a data position in said first enrolled data record, including first and second sample mask reference positions corresponding to said first and second enrollment mask reference positions and corresponding to the first and second reference positions in the enrolled data record reference positions;
  
  d) associating said first key value with said first sample mask reference position and associating said second key value with said second mask reference position to identify in the sample record presence of at least said first and second key values at positions corresponding to reference positions in said first enrolled data record that are associated with said first and second key values; and
  
  e) applying the sample mask to the sample data record to determine whether the first and second key values are at positions in the sample data record corresponding to the first and second sample mask reference positions to identify a possible match between the sample data record and said first enrolled data record.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 2. The method of claim 1 further including, when applying the sample mask to the sample data record, determining the number of occurrences in which key values of the first enrolled data record are identified and determining whether the possible match exists between the sample data record and said first enrolled data record based on whether the number of occurrences meets a threshold value.
  - 3. The method of claim 2 wherein the number of occurrences is determined by processing an enrolled mask with the sample mask.
  - 4. A method according to claim 2, wherein determining the number of occurrences of key values includes:
    - building a histogram of the numbers of hits for a plurality of data record identifiers; and
      
      identifying data records as possible matches from the histogram.
  - 5. The method according to claim 2 wherein processes for applying the sample mask, determining numbers of occurrences in which key values are identified and determining whether a possible match exists are performed for the plurality of enrolled data records among a plurality of processors operating on parallel, each processor forwarding a match result to a consolidator, said consolidator identifying stored data records as possible matches in dependence upon said match results.
  - 6. The method of claim 2 in which said first enrolled data record is stored for possible further processing.
  - 7. The method of claim 6 including the additional step of further analysing the relationship between the sample data record and each possible matching data record.
  - 8. The method of claim 1 wherein the second plurality of reference positions among the first plurality of data positions is based on the first plurality of enrolled data records.
  - 9. The method of claim 1 wherein record identifiers associated with the key values are stored within a database table for reference positions in said first enrolled data record.
  - 10. The method of claim 9 wherein said record identifiers define key values that could occur in said plurality of reference positions.
  - 11. The method of claim 9 in which said key values are implicit and are not separate from the record identifiers.
  - 12. The method of claim 9 in which the record identifiers are provided as an ordered list of key values.
  - 13. The method of claim 9 in which one or more pointers are maintained linking at least some of said key values to said record identifiers.
  - 14. The method of claim 13 wherein the pointers are held in a lookup table.

15. A system for identifying possible matches between a sample data record and a plurality of enrolled data records, the system comprising:
- a processor; and
  
  memory storing instructions which, when executed by the processor, cause the processor to perform the steps of;
  
  a) prior to initiating a process for matching a sample data record separate from the data base with one of the enrolled data records in the data base, without first associating the sample data record with one of the enrolled data records, designating a second plurality of reference positions among the first plurality of data positions in a first enrolled data record, some of which reference positions in said first enrolled data record are separated by other data positions in said first enrolled data record, each reference position corresponding to a location in said first enrolled data record at which a key value, useful as a characteristic feature for identifying said first enrolled data record, is positioned, there being a first key value at a first enrolled data record reference position and a second key value at a second enrolled data record reference position, the totality of said key values providing an identification for distinguishing said first enrolled data record from others in the plurality of enrolled data records;
  
  b) providing, for at least said first enrolled data record, an enrollment mask comprising a series of enrollment mask data positions, each corresponding to one in the first plurality of data positions in said first enrolled data record, the enrollment mask including at least first and second enrollment mask reference positions corresponding to first and second enrolled data record reference positions, wherein the first key value is associated with said first enrollment mask reference position and the second key value is associated with said second enrollment mask reference position to match a sample record with said first enrolled data record;
  
  c) for the sample data record, defining a sample mask comprising sample mask data positions, each corresponding to a data position in said first enrolled data record, including first and second sample mask reference positions corresponding to said first and second enrollment mask reference positions and corresponding to the first and second reference positions in the enrolled data record reference positions;
  
  d) associating said first key value with said first sample mask reference position and associating said second key value with said second mask reference position to identify in the sample record presence of at least said first and second key values at positions corresponding to reference positions in said first enrolled data record that are associated with said first and second key values; and
  
  e) applying the sample mask to the sample data record to determine whether the first and second key values are at positions in the sample data record corresponding to the first and second sample mask reference positions to identify a possible match between the sample data record and said first enrolled data record.
- View Dependent Claims (16, 17, 18)
- - 16. The system of claim 15, wherein the memory stores additional instructions which, when executed by the processor, cause the processor to perform the steps of:
    - determining a number of occurrences of the key values based on two or more reference positions in said first enrolled data record that are separated by other positions in said first enrolled data record;
      
      based on the number of occurrences of key values, determining whether said first enrolled data record is a possible match with the sample data record;
      
      storing a list of record identifiers; and
      
      scaling said number of occurrences.
  - 17. The system of claim 16 wherein the list of record identifiers is stored within a database table for a reference position in at least one in the plurality of enrolled data records.
  - 18. The system of claim 15 wherein the plurality of reference positions is calculated from the plurality of enrolled data records.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
FotoNation Limited (Adeia, Inc.)
Original Assignee
FotoNation Limited (Adeia, Inc.)
Inventors
Monro, Donald Martin
Primary Examiner(s)
Jalil, Neveen Abel
Assistant Examiner(s)
CONYERS, DAWAUNE A

Application Number

US13/295,560
Publication Number

US 20120136872A1
Time in Patent Office

2,227 Days
Field of Search

707780
US Class Current
CPC Class Codes

G06F 16/334 Query execution G06F16/335 ...

Fast database matching

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Fast database matching

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links