Systems and user interfaces for dynamic and interactive investigation based on automatic clustering of related data in various data structures

US 10,264,014 B2
Filed: 10/30/2015
Issued: 04/16/2019
Est. Priority Date: 03/15/2013
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

generating, based on a plurality of captured communications, a filtered collection of captured communications by selecting captured communications that include a user-agent string and removing captured communications with destinations on an approved list of destinations, wherein the approved list of destinations indicate destinations that are unlikely to be related to malware activity;

determining, based on the filtered collection of captured communications, a first set of captured communications associated with a test time period and a second set of captured communications associated with a reference time period;

identifying a first captured communication in the first set that is not included among the second set of captured communications, wherein the first captured communication indicates a new user-agent string not previously associated with the reference time period; and

designating the new user-agent string as a seed; and

generating a data item cluster based on the seed, wherein generating the data item cluster comprises;

adding the seed to the data item cluster; and

adding to the data item cluster one or more user-agent-related data items determined to be associated with the seed, wherein the one or more user-agent-related data items comprises information associated with a computing device.

View all claims

8 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In various embodiments, systems, methods, and techniques are disclosed for generating a collection of clusters of related data from a seed. Seeds may be generated based on seed generation strategies or rules. Clusters may be generated by, for example, retrieving a seed, adding the seed to a first cluster, retrieving a clustering strategy or rules, and adding related data and/or data entities to the cluster based on the clustering strategy. Various cluster scores may be generated based on attributes of data in a given cluster. Further, cluster metascores may be generated based on various cluster scores associated with a cluster. Clusters may be ranked based on cluster metascores. Various embodiments may enable an analyst to discover various insights related to data clusters, and may be applicable to various tasks including, for example, tax fraud detection, beaconing malware detection, malware user-agent detection, and/or activity trend detection, among various others.

699 Citations

19 Claims

1. A computer-implemented method comprising:
- generating, based on a plurality of captured communications, a filtered collection of captured communications by selecting captured communications that include a user-agent string and removing captured communications with destinations on an approved list of destinations, wherein the approved list of destinations indicate destinations that are unlikely to be related to malware activity;
  
  determining, based on the filtered collection of captured communications, a first set of captured communications associated with a test time period and a second set of captured communications associated with a reference time period;
  
  identifying a first captured communication in the first set that is not included among the second set of captured communications, wherein the first captured communication indicates a new user-agent string not previously associated with the reference time period; and
  
  designating the new user-agent string as a seed; and
  
  generating a data item cluster based on the seed, wherein generating the data item cluster comprises;
  
  adding the seed to the data item cluster; and
  
  adding to the data item cluster one or more user-agent-related data items determined to be associated with the seed, wherein the one or more user-agent-related data items comprises information associated with a computing device.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The computer-implemented method of claim 1, wherein the one or more user-agent-related data items further include at least one of:
    - a user of a particular computing device, an internal Internet Protocol address, an external Internet Protocol address, an external domain, an internal computing device, an external computing device, or a host-based event.
  - 3. The computer-implemented method of claim 2, further comprising:
    - identifying the one or more user-agent-related data items based at least on a clustering strategy, wherein the clustering strategy queries one or more cluster data sources to determine at least one of;
      
      originating host or destination computing devices associated with the seed, users of originating host computing devices, intrusion prevention system alerts associated with originating host computing devices, internal Internet Protocol addresses associated with originating host computing devices, external Internet Protocol addresses associated with destination computing devices, or external domains associated with the first captured communication.
  - 4. The computer-implemented method of claim 1, further comprising:
    - determining a score for the data item cluster; and
      
      causing presentation of the data item cluster and the score in a user interface of a client computing device.
  - 5. The computer-implemented method of claim 1, wherein generating, based on the plurality of captured communications, the filtered collection of captured communications further includes removing captured communications with respective user-agent strings on an approved list of user-agent strings, wherein the approved list of user-agent strings indicate communications that are unlikely to be related to malware activity.
  - 6. The computer-implemented method of claim 1, wherein generating, based on the plurality of captured communications, the filtered collection of captured communications further includes removing captured communications associated with a particular external computer system, wherein the particular external computer system is unlikely to be related to malware activity.
  - 7. The computer-implemented method of claim 1, further comprising:
    - identifying a quantity of appearances of the new user-agent string in corresponding captured communications among the first set of captured communications; and
      
      determining the quantity is below a predetermined threshold.

8. A non-transitory computer-readable storage medium storing computer-executable instructions that, when executed by a computer system, configure the computer system to perform operations comprising:
- generating, based on a plurality of captured communications, a filtered collection of captured communications by selecting captured communications that include a user-agent string and removing captured communications with respective user-agent strings on an approved list of user-agent strings, wherein the approved list of destinations indicate destinations that are unlikely to be related to malware activity;
  
  determining, based on the filtered collection of captured communications, a first set of captured communications associated with a test time period and a second set of captured communications associated with a reference time period;
  
  identifying a first captured communication in the first set that is not included among the second set of captured communications, wherein the first captured communication indicates a new user-agent string not previously associated with the reference time period; and
  
  designating the new user-agent string as a seed; and
  
  generating a data item cluster based on the seed, wherein generating the data item cluster comprises;
  
  adding the seed to the data item cluster; and
  
  adding to the data item cluster one or more user-agent-related data items determined to be associated with the seed, wherein the one or more user-agent-related data items comprises information associated with a computing device.
- View Dependent Claims (9, 10, 11, 12)
- - 9. The non-transitory computer-readable storage medium of claim 8, wherein the one or more user-agent-related data items further include at least one of:
    - a user of a particular computing device, an internal Internet Protocol address, an external Internet Protocol address, an external domain, an internal computing device, an external computing device, or a host-based event.
  - 10. The non-transitory computer-readable storage medium of claim 8, wherein the computer-executable instructions further configure the computer system to perform operations comprising:
    - determining a score for the data item cluster; and
      
      causing presentation of the data item cluster and the score in a user interface of a client computing device.
  - 11. The non-transitory computer-readable storage medium of claim 8, wherein generating, based on the plurality of captured communications, the filtered collection of captured communications further includes removing captured communications with destinations on an approved list of destinations, wherein the approved list of destinations indicate destinations that are unlikely to be related to malware activity.
  - 12. The non-transitory computer-readable storage medium of claim 8, wherein the computer-executable instructions further configure the computer system to perform operations comprising:
    - identifying a quantity of appearances of the new user-agent string in corresponding captured communications among the first set of captured communications; and
      
      determining the quantity is below a predetermined threshold.

13. A computer system comprising:
- one or more computer readable storage devices configured to store;
  
  a plurality of captured communications between an internal network and an external network; and
  
  a plurality of user-agent-related data items, wherein a data item of the plurality of user-agent-related data items comprises information associated with a computing device;
  
  one or more hardware computer processors in communication with the one or more computer readable storage devices and configured to execute computer executable instructions in order to cause the one or more hardware computer processors to;
  
  generate, based on the plurality of captured communications, a filtered collection of captured communications by selecting captured communications that include a user-agent string;
  
  determine, based on the filtered collection of captured communications, a first set of captured communications associated with a test time period and a second set of captured communications associated with a reference time period;
  
  identify a first captured communication in the first set that is not included among the second set of captured communications, wherein the first captured communication indicates a new user-agent string not previously associated with the reference time period; and
  
  designate the new user-agent string as a seed; and
  
  generate a data item cluster based on the seed, wherein generating the data item cluster comprises;
  
  adding the seed to the data item cluster; and
  
  adding to the data item cluster one or more user-agent-related data items from the plurality of user-agent-related data items determined to be associated with the seed.
- View Dependent Claims (14, 15, 16, 17, 18, 19)
- - 14. The computer system of claim 13, wherein the plurality of user-agent-related data items include at least one of:
    - a user of a particular computing device, an internal Internet Protocol address, an external Internet Protocol address, an external domain, an internal computing device, an external computing device, or a host-based event.
  - 15. The computer system of claim 13, wherein the one or more hardware computer processors are further configured to cause presentation of the data item cluster in a user interface of a client computing device.
  - 16. The computer system of claim 13, wherein generating, based on the plurality of captured communications, the filtered collection of captured communications further includes removing captured communications with destinations on an approved list of destinations, wherein the approved list of destinations indicate destinations that are unlikely to be related to malware activity.
  - 17. The computer system of claim 13, wherein generating, based on the plurality of captured communications, the filtered collection of captured communications further includes removing captured communications with respective user-agent strings on an approved list of user-agent strings, wherein the approved list of user-agent strings indicate communications that are unlikely to be related to malware activity.
  - 18. The computer system of claim 13, wherein generating, based on the plurality of captured communications, the filtered collection of captured communications further includes removing captured communications associated with a particular external computer system, wherein the particular external computer system is unlikely to be related to malware activity.
  - 19. The computer system of claim 13, wherein the one or more hardware computer processors are further configured to:
    - identify a quantity of appearances of the new user-agent string in corresponding captured communications among the first set of captured communications; and
      
      determine the quantity is below a predetermined threshold.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Palantir Technologies Incorporated
Original Assignee
Palantir Technologies Incorporated
Inventors
Stowe, Geoff, Singh, Harkirat, Bach, Stefan, Sprague, Matthew, Kross, Michael, Borochoff, Adam, Menon, Parvathy, Harris, Michael
Primary Examiner(s)
Hamilton, Lalita M

Application Number

US14/928,512
Publication Number

US 20190052648A1
Time in Patent Office

1,264 Days
Field of Search

705 35
US Class Current
CPC Class Codes

G06F 16/23   Updating

G06F 16/244   Grouping and aggregation

G06F 16/24578   using ranking

G06F 16/2465   Query processing support fo...

G06F 16/26   Visual data mining; Browsin...

G06F 16/283   Multi-dimensional databases...

G06F 16/285   Clustering or classification

G06F 16/287   Visualization; Browsing

G06F 16/288   Entity relationship models

G06F 16/335   Filtering based on addition...

G06F 16/35   Clustering; Classification

G06F 16/355   Class or cluster creation o...

G06F 16/9535   Search customisation based ...

G06Q 10/10   Office automation; Time man...

G06Q 20/382   insuring higher security of...

G06Q 20/4016   involving fraud or risk lev...

G06Q 30/0185   Product, service or busines...

G06Q 40/00   Finance; Insurance; Tax str...

G06Q 40/02   Banking, e.g. interest calc...

G06Q 40/03   Credit; Loans; Processing t...

G06Q 40/10 : Tax strategies

G06Q 40/123 : Tax preparation or submission

H04L 63/145 : the attack involving the pr...

View All

Systems and user interfaces for dynamic and interactive investigation based on automatic clustering of related data in various data structures

First Claim

8 Assignments

0 Petitions

Accused Products

Abstract

699 Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Systems and user interfaces for dynamic and interactive investigation based on automatic clustering of related data in various data structures

First Claim

8 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

699 Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links