×

Identification of anomalous data records

  • US 7,668,843 B2
  • Filed: 12/14/2005
  • Issued: 02/23/2010
  • Est. Priority Date: 12/22/2004
  • Status: Active Grant
First Claim
Patent Images

1. A method for execution by one or more digital processors, the method for detecting whether a current record in a dataset of records is an anomalous record, comprising:

  • defining a feature of the records in the dataset;

    calculating, by the one or more digital processors, a plurality of pairwise distances between a value of the feature in the current record and values of the feature in at least some of the records in the dataset, where each pairwise distance is either;

    (A) small for mismatches between the values when both of the values rarely occur in the dataset records or where the distance is large for mismatches between the values when both of the values commonly occur in the dataset records;

    or(B) large for mismatches between the values when both of the values rarely occur in the dataset records or where the distance is small when both of the values commonly occur in the dataset records; and

    in response to the plurality of distances d, producing a score for the current record;

    indicating that the current record is anomalous if the score meets a predetermined criterion;

    wherein the distances are responsive to the frequency Freq(vi) of the value vi of the feature in the current record and to the frequency Freq(vj) of the value vj of the feature in another of the records in the dataset; and

    calculating each of the pairwise distances d from a value vi of the feature in the current record and a value vj of the feature in the other of the dataset records according to the relation

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×