Data item clustering and analysis
First Claim
1. A computer system for efficiently analyzing large amounts of malfeasance-related data, the computer system comprising:
- one or more computer readable storage devices configured to store;
a plurality of computer executable instructions;
one or more software modules including the plurality of computer executable instructions, the one or more software modules including a cluster engine module, a user interface engine module, and a workflow engine module;
a plurality of data clustering strategies based on rules generated by the cluster engine module; and
a plurality of data cluster types, each data cluster type of the plurality of data cluster types associated with a data clustering strategy of the plurality of data clustering strategies;
one or more cluster data sources configured to store malfeasance-related data items, the malfeasance-related data items including at least one of;
electronic documents, accounts, profiles, alerts, records, or communications; and
one or more hardware computer processors in communication with the one or more computer readable storage devices and the one or more cluster data sources, and configured to execute the one or more software modules in order to cause the computer system to;
designate, by the cluster engine module, one or more seeds by;
accessing, from the one or more cluster data sources, the malfeasance-related data items; and
selecting one or more data items from the accessed malfeasance-related data items based at least on a property value of at least one electronic document, account, profile, alert, record, or communication of the one or more data items evidencing suspicious behavior or a likelihood of fraud or criminal activity, and designating each data item of the one or more data items as a seed;
for each data item seed of the one or more designated seeds;
select, by the cluster engine module, a particular data clustering strategy from the plurality of data clustering strategies;
identify, by the cluster engine module, one or more malfeasance-related data items based at least on the particular data clustering strategy, wherein the particular data clustering strategy queries the one or more cluster data sources to determine the one or more malfeasance-related data items associated with the data item seed; and
generate, by the cluster engine module, a data cluster based at least on the data item seed, wherein generating the data cluster comprises;
adding the data item seed to the data cluster;
adding the one or more malfeasance-related data items identified as being associated with the data item seed to the data cluster;
identifying an additional one or more data items associated with any data items of the data cluster;
adding the additional one or more data items to the data cluster; and
storing the generated data cluster in the one or more computer readable storage devices;
generate, by the user interface engine module, at least one human-readable conclusion associated with at least one generated data cluster, wherein generating the at least one human-readable conclusion comprises;
determining a particular data cluster type from the plurality of cluster types based at least on the particular data clustering strategy;
identifying one or more human-readable templates comprising pre-generated text, wherein identifying the human-readable templates is based at least on predefined associations between respective human-readable templates and data cluster types, and the particular data cluster type;
automatically analyzing the at least one generated data cluster to generate summary data according to rules, scoring algorithms, or other criteria; and
populating the identified one or more human-readable templates with data from the at least one generated data cluster or summary data of the at least one generated data cluster;
cause presentation, by the user interface engine module, of the at least one generated data cluster and the at least one human-readable conclusion including the populated pre-generated text data, in a user interface of a client computing device; and
generate, by the workflow engine and the user interface engine, an interactive workflow process to allow the user to perform at least one of;
select new seeds, operate on existing seeds, generate new data clusters, or regenerate existing clusters.
8 Assignments
0 Petitions
Accused Products
Abstract
Embodiments of the present disclosure relate to a data analysis system that may automatically generate memory-efficient clustered data structures, automatically analyze those clustered data structures, and provide results of the automated analysis in an optimized way to an analyst. The automated analysis of the clustered data structures (also referred to herein as data clusters) may include an automated application of various criteria or rules so as to generate a compact, human-readable analysis of the data clusters. The human-readable analyzes (also referred to herein as “summaries” or “conclusions”) of the data clusters may be organized into an interactive user interface so as to enable an analyst to quickly navigate among information associated with various data clusters and efficiently evaluate those data clusters in the context of, for example, a fraud investigation. Embodiments of the present disclosure also relate to automated scoring of the clustered data structures.
-
Citations
14 Claims
-
1. A computer system for efficiently analyzing large amounts of malfeasance-related data, the computer system comprising:
-
one or more computer readable storage devices configured to store; a plurality of computer executable instructions; one or more software modules including the plurality of computer executable instructions, the one or more software modules including a cluster engine module, a user interface engine module, and a workflow engine module; a plurality of data clustering strategies based on rules generated by the cluster engine module; and a plurality of data cluster types, each data cluster type of the plurality of data cluster types associated with a data clustering strategy of the plurality of data clustering strategies; one or more cluster data sources configured to store malfeasance-related data items, the malfeasance-related data items including at least one of;
electronic documents, accounts, profiles, alerts, records, or communications; andone or more hardware computer processors in communication with the one or more computer readable storage devices and the one or more cluster data sources, and configured to execute the one or more software modules in order to cause the computer system to; designate, by the cluster engine module, one or more seeds by; accessing, from the one or more cluster data sources, the malfeasance-related data items; and selecting one or more data items from the accessed malfeasance-related data items based at least on a property value of at least one electronic document, account, profile, alert, record, or communication of the one or more data items evidencing suspicious behavior or a likelihood of fraud or criminal activity, and designating each data item of the one or more data items as a seed; for each data item seed of the one or more designated seeds; select, by the cluster engine module, a particular data clustering strategy from the plurality of data clustering strategies; identify, by the cluster engine module, one or more malfeasance-related data items based at least on the particular data clustering strategy, wherein the particular data clustering strategy queries the one or more cluster data sources to determine the one or more malfeasance-related data items associated with the data item seed; and generate, by the cluster engine module, a data cluster based at least on the data item seed, wherein generating the data cluster comprises; adding the data item seed to the data cluster; adding the one or more malfeasance-related data items identified as being associated with the data item seed to the data cluster; identifying an additional one or more data items associated with any data items of the data cluster; adding the additional one or more data items to the data cluster; and storing the generated data cluster in the one or more computer readable storage devices; generate, by the user interface engine module, at least one human-readable conclusion associated with at least one generated data cluster, wherein generating the at least one human-readable conclusion comprises; determining a particular data cluster type from the plurality of cluster types based at least on the particular data clustering strategy; identifying one or more human-readable templates comprising pre-generated text, wherein identifying the human-readable templates is based at least on predefined associations between respective human-readable templates and data cluster types, and the particular data cluster type; automatically analyzing the at least one generated data cluster to generate summary data according to rules, scoring algorithms, or other criteria; and populating the identified one or more human-readable templates with data from the at least one generated data cluster or summary data of the at least one generated data cluster; cause presentation, by the user interface engine module, of the at least one generated data cluster and the at least one human-readable conclusion including the populated pre-generated text data, in a user interface of a client computing device; and generate, by the workflow engine and the user interface engine, an interactive workflow process to allow the user to perform at least one of;
select new seeds, operate on existing seeds, generate new data clusters, or regenerate existing clusters. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
Specification