×

Systems and methods for identifying anomalous data in large structured data sets and querying the data sets

  • US 9,965,524 B2
  • Filed: 04/03/2014
  • Issued: 05/08/2018
  • Est. Priority Date: 04/03/2013
  • Status: Active Grant
First Claim
Patent Images

1. A system that identifies anomalous data in a record set by comparing frequencies of unique elements obtained from the record set and frequencies of the unique elements in a reference data set, the system including:

  • a computer including memory; and

    computer instructions causing the computer to implement;

    creating an expanded tuple set by automatically expanding an existing first tuple set of a first feature from the record set to include a second tuple set of a second feature from the record set, the existing first tuple set being expanded by (i) adding the second tuple set to the existing first tuple set and (ii) creating unique elements with elements from the first feature from the record set and the second feature from the record set, wherein the unique elements in the expanded tuple set enumerate permutations of unique values of the second feature from the record set that are combined with values of the first feature from the record set to form the expanded tuple set;

    identifying a count of how often each feature value combination of the unique elements is found in the expanded tuple set;

    limiting the unique elements in the expanded tuple set to inhabited feature value combinations by (i) applying a threshold count criterion of 2 or more to the identified counts of how often the feature value combinations of the unique elements are found in the expanded tuple set and (ii) not retaining unique elements in the expanded tuple set that do not satisfy the threshold count criterion;

    after expanding the existing first tuple set into the expanded tuple set and applying the threshold count criterion, comparing frequencies of the unique elements in the expanded tuple set to frequencies of the unique elements in the reference data set to identify anomalous frequencies of the unique elements in the expanded tuple set with respect to the frequencies of the unique elements in the reference data set; and

    spotting outliers from the expanded tuple set with respect to the reference data set based on the identified anomalous frequencies.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×