Systems and user interfaces for dynamic and interactive investigation based on automatic clustering of related data in various data structures
First Claim
1. A computer-implemented method comprising:
- generating, based on a plurality of captured communications, a filtered collection of captured communications by selecting captured communications that include a user-agent string and removing captured communications with destinations on an approved list of destinations, wherein the approved list of destinations indicate destinations that are unlikely to be related to malware activity;
determining, based on the filtered collection of captured communications, a first set of captured communications associated with a test time period and a second set of captured communications associated with a reference time period;
identifying a first captured communication in the first set that is not included among the second set of captured communications, wherein the first captured communication indicates a new user-agent string not previously associated with the reference time period; and
designating the new user-agent string as a seed; and
generating a data item cluster based on the seed, wherein generating the data item cluster comprises;
adding the seed to the data item cluster; and
adding to the data item cluster one or more user-agent-related data items determined to be associated with the seed, wherein the one or more user-agent-related data items comprises information associated with a computing device.
8 Assignments
0 Petitions
Accused Products
Abstract
In various embodiments, systems, methods, and techniques are disclosed for generating a collection of clusters of related data from a seed. Seeds may be generated based on seed generation strategies or rules. Clusters may be generated by, for example, retrieving a seed, adding the seed to a first cluster, retrieving a clustering strategy or rules, and adding related data and/or data entities to the cluster based on the clustering strategy. Various cluster scores may be generated based on attributes of data in a given cluster. Further, cluster metascores may be generated based on various cluster scores associated with a cluster. Clusters may be ranked based on cluster metascores. Various embodiments may enable an analyst to discover various insights related to data clusters, and may be applicable to various tasks including, for example, tax fraud detection, beaconing malware detection, malware user-agent detection, and/or activity trend detection, among various others.
699 Citations
19 Claims
-
1. A computer-implemented method comprising:
-
generating, based on a plurality of captured communications, a filtered collection of captured communications by selecting captured communications that include a user-agent string and removing captured communications with destinations on an approved list of destinations, wherein the approved list of destinations indicate destinations that are unlikely to be related to malware activity; determining, based on the filtered collection of captured communications, a first set of captured communications associated with a test time period and a second set of captured communications associated with a reference time period; identifying a first captured communication in the first set that is not included among the second set of captured communications, wherein the first captured communication indicates a new user-agent string not previously associated with the reference time period; and designating the new user-agent string as a seed; and generating a data item cluster based on the seed, wherein generating the data item cluster comprises; adding the seed to the data item cluster; and adding to the data item cluster one or more user-agent-related data items determined to be associated with the seed, wherein the one or more user-agent-related data items comprises information associated with a computing device. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A non-transitory computer-readable storage medium storing computer-executable instructions that, when executed by a computer system, configure the computer system to perform operations comprising:
-
generating, based on a plurality of captured communications, a filtered collection of captured communications by selecting captured communications that include a user-agent string and removing captured communications with respective user-agent strings on an approved list of user-agent strings, wherein the approved list of destinations indicate destinations that are unlikely to be related to malware activity; determining, based on the filtered collection of captured communications, a first set of captured communications associated with a test time period and a second set of captured communications associated with a reference time period; identifying a first captured communication in the first set that is not included among the second set of captured communications, wherein the first captured communication indicates a new user-agent string not previously associated with the reference time period; and designating the new user-agent string as a seed; and generating a data item cluster based on the seed, wherein generating the data item cluster comprises; adding the seed to the data item cluster; and adding to the data item cluster one or more user-agent-related data items determined to be associated with the seed, wherein the one or more user-agent-related data items comprises information associated with a computing device. - View Dependent Claims (9, 10, 11, 12)
-
-
13. A computer system comprising:
-
one or more computer readable storage devices configured to store; a plurality of captured communications between an internal network and an external network; and a plurality of user-agent-related data items, wherein a data item of the plurality of user-agent-related data items comprises information associated with a computing device; one or more hardware computer processors in communication with the one or more computer readable storage devices and configured to execute computer executable instructions in order to cause the one or more hardware computer processors to; generate, based on the plurality of captured communications, a filtered collection of captured communications by selecting captured communications that include a user-agent string; determine, based on the filtered collection of captured communications, a first set of captured communications associated with a test time period and a second set of captured communications associated with a reference time period; identify a first captured communication in the first set that is not included among the second set of captured communications, wherein the first captured communication indicates a new user-agent string not previously associated with the reference time period; and designate the new user-agent string as a seed; and generate a data item cluster based on the seed, wherein generating the data item cluster comprises; adding the seed to the data item cluster; and adding to the data item cluster one or more user-agent-related data items from the plurality of user-agent-related data items determined to be associated with the seed. - View Dependent Claims (14, 15, 16, 17, 18, 19)
-
Specification