×

Clustering data based on indications of financial malfeasance

  • US 9,230,280 B1
  • Filed: 05/15/2014
  • Issued: 01/05/2016
  • Est. Priority Date: 03/15/2013
  • Status: Active Grant
First Claim
Patent Images

1. A computer system to assist a human analyst in analyzing large amounts of electronic communications for malfeasance, comprising:

  • one or more computer readable storage devices configured to store;

    one or more software modules including computer executable instructions, the one or more software modules including a cluster engine module and a workflow engine module;

    a plurality of clustering strategies;

    a plurality of transaction risk indicators;

    a plurality of communication risk indicators; and

    at least one scoring criterion;

    one or more cluster data sources configure to store;

    a plurality of transaction data items and properties associated with respective transaction data items, each of the properties including associated property values;

    a plurality of email data items;

    a plurality of person data items; and

    a plurality of recipient data items; and

    one or more hardware computer processors in communication with the one or more computer readable storage devices and the one or more cluster data sources, and configured to execute the one or more software modules in order to cause the one or more hardware computer processors to;

    designate, by the cluster engine module, one or more seeds by;

    accessing, from the one or more computer readable storage devices, the plurality of transaction risk indicators and at least one transaction data item of the plurality of transaction data items;

    comparing the plurality of transaction risk indicators to the at least one transaction data item and associated properties; and

    based at least on the comparison and in response to determining the at least one transaction data item is related to at least one transaction risk indicator, designating the at least one transaction data item as a first seed;

    determining a subset of email data items from the plurality of email data items that are identifiable as likely side conversations, wherein determining the subset of email data items comprises identifying an email data item that has at least one less of a particular participant than a previous email associated with the email data item;

    searching the subset of email data items to identify an initial email data item, distinct from the at least one transaction data item, based at least on a communication risk indicator of the plurality of communication risk indicators and a sender or recipient of the initial email data item corresponding to a person associated with the at least one transaction data item; and

    designating the initial email data item as a second seed;

    for each designated first and second seed;

    identify, by the cluster engine module, one or more first data items determined to be associated with the first seed based at least in part on a first clustering strategy of the plurality of clustering strategies, wherein the first clustering strategy queries the one or more cluster data sources to determine at least one of;

    a person data item from the plurality of person data items associated with the first seed of the at least one transaction data item, or an email data item of the plurality of email data items associated with the person data item;

    identify, by the cluster engine module, one or more second data items determined to be associated with the second seed based at least in part on a second clustering strategy of the plurality of clustering strategies, wherein the second clustering strategy queries the one or more cluster data sources to determine at least one of;

    a recipient data item from the plurality of recipient data items associated with the second seed of the initial email data item, a person data item from the plurality of person data items associated with at least one of the recipient data item or the sender of the initial email data item, or a transaction data item of the plurality of transaction data items associated with the person data item;

    generate, by the cluster engine module, a cluster based at least on the first and second seed, wherein generating the cluster comprises;

    adding the first and second seed to the cluster;

    adding the one or more first data items to the cluster;

    adding the one or more second data items to the cluster;

    storing the generated cluster in the one or more computer readable storage devices; and

    determine, by the cluster engine module, a score for the generated cluster, wherein determining the score for the generated cluster comprises;

    accessing, from the one or more computer readable storage devices, the at least one scoring criterion; and

    generating a cluster score for the generated cluster by assessing the generated cluster based at least on the accessed at least one scoring criterion; and

    cause presentation, by the workflow engine module, of at least one generated cluster and the determined score for the at least one generated cluster in a user interface of a client computing device.

View all claims
  • 8 Assignments
Timeline View
Assignment View
    ×
    ×