Clustering data based on indications of financial malfeasance
First Claim
1. A computer system to assist a human analyst in analyzing large amounts of electronic communications for malfeasance, comprising:
- one or more computer readable storage devices configured to store;
one or more software modules including computer executable instructions, the one or more software modules including a cluster engine module and a workflow engine module;
a plurality of clustering strategies;
a plurality of transaction risk indicators;
a plurality of communication risk indicators; and
at least one scoring criterion;
one or more cluster data sources configure to store;
a plurality of transaction data items and properties associated with respective transaction data items, each of the properties including associated property values;
a plurality of email data items;
a plurality of person data items; and
a plurality of recipient data items; and
one or more hardware computer processors in communication with the one or more computer readable storage devices and the one or more cluster data sources, and configured to execute the one or more software modules in order to cause the one or more hardware computer processors to;
designate, by the cluster engine module, one or more seeds by;
accessing, from the one or more computer readable storage devices, the plurality of transaction risk indicators and at least one transaction data item of the plurality of transaction data items;
comparing the plurality of transaction risk indicators to the at least one transaction data item and associated properties; and
based at least on the comparison and in response to determining the at least one transaction data item is related to at least one transaction risk indicator, designating the at least one transaction data item as a first seed;
determining a subset of email data items from the plurality of email data items that are identifiable as likely side conversations, wherein determining the subset of email data items comprises identifying an email data item that has at least one less of a particular participant than a previous email associated with the email data item;
searching the subset of email data items to identify an initial email data item, distinct from the at least one transaction data item, based at least on a communication risk indicator of the plurality of communication risk indicators and a sender or recipient of the initial email data item corresponding to a person associated with the at least one transaction data item; and
designating the initial email data item as a second seed;
for each designated first and second seed;
identify, by the cluster engine module, one or more first data items determined to be associated with the first seed based at least in part on a first clustering strategy of the plurality of clustering strategies, wherein the first clustering strategy queries the one or more cluster data sources to determine at least one of;
a person data item from the plurality of person data items associated with the first seed of the at least one transaction data item, or an email data item of the plurality of email data items associated with the person data item;
identify, by the cluster engine module, one or more second data items determined to be associated with the second seed based at least in part on a second clustering strategy of the plurality of clustering strategies, wherein the second clustering strategy queries the one or more cluster data sources to determine at least one of;
a recipient data item from the plurality of recipient data items associated with the second seed of the initial email data item, a person data item from the plurality of person data items associated with at least one of the recipient data item or the sender of the initial email data item, or a transaction data item of the plurality of transaction data items associated with the person data item;
generate, by the cluster engine module, a cluster based at least on the first and second seed, wherein generating the cluster comprises;
adding the first and second seed to the cluster;
adding the one or more first data items to the cluster;
adding the one or more second data items to the cluster;
storing the generated cluster in the one or more computer readable storage devices; and
determine, by the cluster engine module, a score for the generated cluster, wherein determining the score for the generated cluster comprises;
accessing, from the one or more computer readable storage devices, the at least one scoring criterion; and
generating a cluster score for the generated cluster by assessing the generated cluster based at least on the accessed at least one scoring criterion; and
cause presentation, by the workflow engine module, of at least one generated cluster and the determined score for the at least one generated cluster in a user interface of a client computing device.
8 Assignments
0 Petitions
Accused Products
Abstract
In various embodiments, systems, methods, and techniques are disclosed for generating a collection of clusters of related data from a seed to assist in detection of financial malfeasance. Seeds may be generated based on seed generation strategies or rules. Clusters may be generated by, for example, retrieving a seed, adding the seed to a first cluster, retrieving a clustering strategy or rules, and adding related data (such as trades, emails or chat messages) and/or data entities to the cluster based on the clustering strategy. Various cluster scores may be generated based on attributes of data in a given cluster, and the clusters may be displayed and ranked based on their scores. Various embodiments may enable an analyst to review clusters of trades, emails and/or chat messages that are the most likely to reveal financial malfeasance.
249 Citations
20 Claims
-
1. A computer system to assist a human analyst in analyzing large amounts of electronic communications for malfeasance, comprising:
-
one or more computer readable storage devices configured to store; one or more software modules including computer executable instructions, the one or more software modules including a cluster engine module and a workflow engine module; a plurality of clustering strategies; a plurality of transaction risk indicators; a plurality of communication risk indicators; and at least one scoring criterion; one or more cluster data sources configure to store; a plurality of transaction data items and properties associated with respective transaction data items, each of the properties including associated property values; a plurality of email data items; a plurality of person data items; and a plurality of recipient data items; and one or more hardware computer processors in communication with the one or more computer readable storage devices and the one or more cluster data sources, and configured to execute the one or more software modules in order to cause the one or more hardware computer processors to; designate, by the cluster engine module, one or more seeds by; accessing, from the one or more computer readable storage devices, the plurality of transaction risk indicators and at least one transaction data item of the plurality of transaction data items; comparing the plurality of transaction risk indicators to the at least one transaction data item and associated properties; and based at least on the comparison and in response to determining the at least one transaction data item is related to at least one transaction risk indicator, designating the at least one transaction data item as a first seed; determining a subset of email data items from the plurality of email data items that are identifiable as likely side conversations, wherein determining the subset of email data items comprises identifying an email data item that has at least one less of a particular participant than a previous email associated with the email data item; searching the subset of email data items to identify an initial email data item, distinct from the at least one transaction data item, based at least on a communication risk indicator of the plurality of communication risk indicators and a sender or recipient of the initial email data item corresponding to a person associated with the at least one transaction data item; and designating the initial email data item as a second seed; for each designated first and second seed; identify, by the cluster engine module, one or more first data items determined to be associated with the first seed based at least in part on a first clustering strategy of the plurality of clustering strategies, wherein the first clustering strategy queries the one or more cluster data sources to determine at least one of;
a person data item from the plurality of person data items associated with the first seed of the at least one transaction data item, or an email data item of the plurality of email data items associated with the person data item;identify, by the cluster engine module, one or more second data items determined to be associated with the second seed based at least in part on a second clustering strategy of the plurality of clustering strategies, wherein the second clustering strategy queries the one or more cluster data sources to determine at least one of;
a recipient data item from the plurality of recipient data items associated with the second seed of the initial email data item, a person data item from the plurality of person data items associated with at least one of the recipient data item or the sender of the initial email data item, or a transaction data item of the plurality of transaction data items associated with the person data item;generate, by the cluster engine module, a cluster based at least on the first and second seed, wherein generating the cluster comprises; adding the first and second seed to the cluster; adding the one or more first data items to the cluster; adding the one or more second data items to the cluster; storing the generated cluster in the one or more computer readable storage devices; and determine, by the cluster engine module, a score for the generated cluster, wherein determining the score for the generated cluster comprises; accessing, from the one or more computer readable storage devices, the at least one scoring criterion; and generating a cluster score for the generated cluster by assessing the generated cluster based at least on the accessed at least one scoring criterion; and cause presentation, by the workflow engine module, of at least one generated cluster and the determined score for the at least one generated cluster in a user interface of a client computing device. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computer system to assist a human analyst in analyzing large amounts of electronic communications for malfeasance, comprising:
-
one or more computer readable storage devices configured to store; one or more software modules including computer executable instructions, the one or more software modules including a cluster engine module and a workflow engine module; a plurality of clustering strategies; a plurality of transaction risk indicators; and a plurality of communication risk indicators; one or more cluster data sources configure to store; a plurality of electronic trade data items, each electronic trade data item associated with a trade of a financial instrument, properties, and property values, each electronic trade data item comprising a trader property associating a trader identifier of a trader executing the trade; and a plurality of electronic communication data items, each electronic communication data item associated with an electronic communication and one or more trader identifiers that sent or received the electronic communication; and
andone or more hardware computer processors in communication with the one or more computer readable storage devices and the one or more cluster data sources, and configured to execute the one or more software modules in order to cause the one or more hardware computer processors to; designate, by the cluster engine module, one or more seeds by; accessing, from the one or more computer readable storage devices, the plurality of transaction risk indicators and at least one electronic trade data item of the plurality of electronic trade data items; comparing the plurality of transaction risk indicators to the at least one electronic trade data item and associated properties; based at least on the comparison and in response to determining the at least one electronic trade data item is related to at least one transaction risk indicator, designating the at least one electronic trade data item as a first seed; determining a subset of electronic communication data items from the plurality of electronic communication data items that are identifiable as likely side conversations, wherein determining the subset of electronic communication data items comprises identifying an electronic communication data item that has at least one less of a particular participant than a previous electronic communication associated with the electronic communication data item; searching the subset of electronic communication data items to identify an initial electronic communication data item, distinct from the at least one electronic trade data item, based at least on a communication risk indicator of the plurality of communication risk indicators and a sender or recipient of the initial electronic communication data item corresponding to a trader associated with the at least one electronic trade data item; and designating the initial electronic communication data item as a second seed; for each designated first and second seed; identify, by the cluster engine module, one or more first data items determined to be associated with the first seed based at least in part on a first clustering strategy of the plurality of clustering strategies, wherein the first clustering strategy queries the one or more cluster data sources to determine at least one of;
a first trader identifier associated with the first seed of the at least one electronic trade data item, or an electronic communication data item of the plurality of electronic communication data items associated with a first trader corresponding to the first trader identifier;identify, by the cluster engine module, one or more second data items determined to be associated with the second seed based at least in part on a second clustering strategy of the plurality of clustering strategies, wherein the second clustering strategy queries the one or more cluster data sources to determine at least one of;
a sender or recipient associated with the second seed of the initial electronic communication data item, an initial participant corresponding to at least one of the sender or recipient of the initial electronic communication data item, or an electronic trade data item of the plurality of electronic trade data items associated with the initial participant;generate, by the cluster engine module, a cluster based at least on the first and second seed, wherein generating the cluster comprises; adding the first and second seed to the cluster; adding the one or more first data items to the cluster; adding the one or more second data items to the cluster; storing the generated cluster in the one or more computer readable storage devices; and determine, by the cluster engine module, a score for the generated cluster, based at least on one or more scoring criterions; and cause presentation, by the workflow engine module, of at least one generated cluster and the determined score for the at least one generated cluster in a user interface of a client computing device. - View Dependent Claims (11, 12, 13, 14, 15)
-
-
16. A computer-implemented method to assist a human analyst in analyzing large amounts of electronic communications for malfeasance, comprising:
-
by one or more computer processors configured to execute software modules comprising computer executable instructions; designating one or more seeds by; accessing a plurality of transaction risk indicators and at least one electronic trade data item from a plurality of electronic trade data items, each electronic trade data item associated with a trade of a financial instrument, properties, and property values, each electronic trade data item comprising a trader property associating a trader identifier of a trader executing the trade; comparing the plurality of transaction risk indicators to the at least one electronic trade data item and associated properties; based at least on the comparison and in response to determining the at least one electronic trade data item is related to at least one transaction risk indicator, designating the at least one electronic trade data item as a first seed; accessing, from one or more computer readable storage devices, a plurality of electronic communication data items, each electronic communication data item associated with an electronic communication and one or more trader identifiers that sent or received the electronic communication; determining a subset of electronic communication data items from the plurality of electronic communication data items that are identifiable as likely side conversations, wherein determining the subset of electronic communication data items comprises identifying an electronic communication data item that has at least one less of a particular participant than a previous electronic communication associated with the electronic communication data item; searching the subset of electronic communication data items to identify an initial electronic communication data item, distinct from the at least one electronic trade data item, based at least on a communication risk indicator of a plurality of communication risk indicators and a sender or recipient of the initial electronic communication data item corresponding to a trader associated with the at least one electronic trade data item; and designating the initial electronic communication data item as a second seed; for each designated first and second seed; identifying one or more first data items determined to be associated with the first seed based at least in part on a first clustering strategy of a plurality of clustering strategies, wherein the first clustering strategy queries one or more cluster data sources to determine at least one of;
a first trader identifier associated with the first seed of the at least one electronic trade data item, or an electronic communication data item of the plurality of electronic communication data items associated with a first trader corresponding to the first trader identifier;identifying one or more second data items determined to be associated with the second seed based at least in part on a second clustering strategy of the plurality of clustering strategies, wherein the second clustering strategy queries the one or more cluster data sources to determine at least one of;
a sender or recipient associated with the second seed of the initial electronic communication data item, an initial participant corresponding to at least one of the sender or recipient of the initial electronic communication data item, or an electronic trade data item of the plurality of electronic trade data items associated with the initial participant;generating a cluster based at least on the first and second seed, wherein generating the cluster comprises; adding the first and second seed to the cluster; adding the one or more first data items to the cluster; adding the one or more second data items to the cluster; storing the generated cluster in the one or more computer readable storage devices; and determining a score for the generated cluster, based at least on one or more scoring criterions; and causing presentation of at least one generated cluster and the determined score for the at least one generated cluster in a user interface of a client computing device. - View Dependent Claims (17, 18, 19, 20)
-
Specification