Internal malware data item clustering and analysis
First Claim
1. A computer system for protecting a computer network from malware by providing for the efficient analysis of large amounts of malware-related data, the computer system comprising:
- one or more computer readable storage devices configured to store;
a plurality of computer executable instructions;
one or more software modules including computer executable instructions, the one or more software modules including a cluster engine module, a user interface engine module and a workflow engine module;
a plurality of data clustering strategies based on rules generated by the cluster engine module; and
a plurality of data cluster types, each data cluster type of the plurality of data cluster types associated with a data clustering strategy;
one or more cluster data sources configured to store;
a plurality of data items including at least;
file data items, each file data item associated with at least one suspected malware file; and
malware-related data items associated with captured communications between an internal network and an external network, the malware-related data items including at least one of;
external Internet Protocol addresses, external domains, external computing devices, internal Internet Protocol addresses, internal computing devices, users of particular computing devices, or organizational positions associated with users of particular computing devices; and
one or more hardware computer processors in communication with the one or more computer readable storage devices and the one or more cluster data sources, and configured to execute the one or more software modules in order to cause the computer system to;
designate, by the cluster engine module, one or more seeds by;
accessing, from the one or more cluster data sources, the file data items;
calculating, for each file data item of the file data items, at least one of a hash of the file data item or a hash of an executed file data item, wherein the executed file data item was generated by an execution of the file data item in a sandboxed environment; and
identifying one or more file data items based at least in part on comparing the at least one hash of the file data item or the executed file data item with a malware threat list of hashes, and designating each of the identified one or more file data items as a seed;
for each of the file data items designated as a seed;
select, by the cluster engine module, a particular data clustering strategy;
identify, by the cluster engine module, one or more malware-related data items determined to be associated with the designated file data item seed based at least on the particular data clustering strategy, wherein the particular data clustering strategy performs at least one of querying the one or more cluster data sources or scanning network traffic to determine at least one of;
external Internet Protocol addresses associated with the designated file data item seed, external domains associated with the designated file data item seed, external computing devices associated with the designated file data item seed, internal Internet Protocol addresses associated with the designated file data item seed, internal computing devices associated with the designated file data item seed users of particular computing devices associated with the designated file data item seed, or organizational positions associated with the determined users of particular computing devices;
generate, by the cluster engine module, a data item cluster based at least on the designated file data item seed, wherein generating the data item cluster comprises;
adding the designated file data item seed to the data item cluster;
identifying one or more of the network indicators that are associated with the seed;
identifying one or more of the network-related data items associated with at least one of the identified one or more of the network indicators;
adding, to the data item cluster, the identified one or more malware-related data items;
identifying an additional one or more data items, including file data items and/or malware-related data items, associated with any data items of the data item cluster;
adding, to the data item cluster, the additional one or more data items; and
storing the one or more data item clusters,generating by the user interface engine module at least one human-readable conclusion associated with at least one generated data item cluster, wherein generating the at least one human-readable conclusion comprises;
determining a particular data cluster type from the plurality of cluster types based at least on the particular data clustering strategy;
identifying one or more human-readable templates comprising pre-generated text, wherein the human-readable templates are based at least on predefined associations between respective human-readable templates and data cluster types;
automatically analyzing the data item cluster to generate summary data according to rules, scoring algorithms, or other criteria; and
populating the identified one or more human-readable templates with data from the at least one generated data item cluster or summary data of the at least one generated data item cluster;
cause presentation, by the user interface engine module, of the at least one generated data item cluster and the at least one human-readable conclusion including the populated pre-generated text data, in a user interface of a client computing device; and
generate, by the workflow engine and the user interface engine, an interactive workflow process to allow the user to perform at least one of;
select new seeds, operate on existing seeds, generate new data clusters, or regenerate existing clusters.
8 Assignments
0 Petitions
Accused Products
Abstract
Embodiments of the present disclosure relate to a data analysis system that may automatically generate memory-efficient clustered data structures, automatically analyze those clustered data structures, and provide results of the automated analysis in an optimized way to an analyst. The automated analysis of the clustered data structures (also referred to herein as data clusters) may include an automated application of various criteria or rules so as to generate a compact, human-readable analysis of the data clusters. The human-readable analysis (also referred to herein as “summaries” or “conclusions”) of the data clusters may be organized into an interactive user interface so as to enable an analyst to quickly navigate among information associated with various data clusters and efficiently evaluate those data clusters in the context of, for example, a fraud investigation. Embodiments of the present disclosure also relate to automated scoring of the clustered data structures.
357 Citations
19 Claims
-
1. A computer system for protecting a computer network from malware by providing for the efficient analysis of large amounts of malware-related data, the computer system comprising:
-
one or more computer readable storage devices configured to store; a plurality of computer executable instructions; one or more software modules including computer executable instructions, the one or more software modules including a cluster engine module, a user interface engine module and a workflow engine module; a plurality of data clustering strategies based on rules generated by the cluster engine module; and a plurality of data cluster types, each data cluster type of the plurality of data cluster types associated with a data clustering strategy; one or more cluster data sources configured to store; a plurality of data items including at least; file data items, each file data item associated with at least one suspected malware file; and malware-related data items associated with captured communications between an internal network and an external network, the malware-related data items including at least one of;
external Internet Protocol addresses, external domains, external computing devices, internal Internet Protocol addresses, internal computing devices, users of particular computing devices, or organizational positions associated with users of particular computing devices; andone or more hardware computer processors in communication with the one or more computer readable storage devices and the one or more cluster data sources, and configured to execute the one or more software modules in order to cause the computer system to; designate, by the cluster engine module, one or more seeds by; accessing, from the one or more cluster data sources, the file data items; calculating, for each file data item of the file data items, at least one of a hash of the file data item or a hash of an executed file data item, wherein the executed file data item was generated by an execution of the file data item in a sandboxed environment; and identifying one or more file data items based at least in part on comparing the at least one hash of the file data item or the executed file data item with a malware threat list of hashes, and designating each of the identified one or more file data items as a seed; for each of the file data items designated as a seed; select, by the cluster engine module, a particular data clustering strategy; identify, by the cluster engine module, one or more malware-related data items determined to be associated with the designated file data item seed based at least on the particular data clustering strategy, wherein the particular data clustering strategy performs at least one of querying the one or more cluster data sources or scanning network traffic to determine at least one of;
external Internet Protocol addresses associated with the designated file data item seed, external domains associated with the designated file data item seed, external computing devices associated with the designated file data item seed, internal Internet Protocol addresses associated with the designated file data item seed, internal computing devices associated with the designated file data item seed users of particular computing devices associated with the designated file data item seed, or organizational positions associated with the determined users of particular computing devices;generate, by the cluster engine module, a data item cluster based at least on the designated file data item seed, wherein generating the data item cluster comprises; adding the designated file data item seed to the data item cluster; identifying one or more of the network indicators that are associated with the seed; identifying one or more of the network-related data items associated with at least one of the identified one or more of the network indicators; adding, to the data item cluster, the identified one or more malware-related data items; identifying an additional one or more data items, including file data items and/or malware-related data items, associated with any data items of the data item cluster; adding, to the data item cluster, the additional one or more data items; and storing the one or more data item clusters, generating by the user interface engine module at least one human-readable conclusion associated with at least one generated data item cluster, wherein generating the at least one human-readable conclusion comprises; determining a particular data cluster type from the plurality of cluster types based at least on the particular data clustering strategy; identifying one or more human-readable templates comprising pre-generated text, wherein the human-readable templates are based at least on predefined associations between respective human-readable templates and data cluster types; automatically analyzing the data item cluster to generate summary data according to rules, scoring algorithms, or other criteria; and populating the identified one or more human-readable templates with data from the at least one generated data item cluster or summary data of the at least one generated data item cluster; cause presentation, by the user interface engine module, of the at least one generated data item cluster and the at least one human-readable conclusion including the populated pre-generated text data, in a user interface of a client computing device; and generate, by the workflow engine and the user interface engine, an interactive workflow process to allow the user to perform at least one of;
select new seeds, operate on existing seeds, generate new data clusters, or regenerate existing clusters. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
-
Specification