Generating data clusters

US 10,216,801 B2
Filed: 08/05/2015
Issued: 02/26/2019
Est. Priority Date: 03/15/2013
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

by one or more hardware computer processors configured with specific computer executable instructions;

accessing one or more electronic data stores, the one or more electronic data stores storing a plurality of data entities and respective data entity attributes;

applying a clustering strategy to generate a data entity cluster by at least;

designating a seed data entity, from the plurality of data entities, as the data entity cluster;

accessing, based on the clustering strategy, one or more search protocols;

performing first growth of the data entity cluster by executing at least a first of the one or more search protocols on the one or more electronic data stores to identify one or more data entities related to the seed data entity;

adding the one or more data entities to the data entity cluster;

performing second growth of the data entity cluster by executing at least a second of the one or more search protocols on the one or more electronic data stores to identify one or more additional data entities related to the one or more added data entities, the second search protocol different than the first search protocol; and

adding the one or more additional data entities to the data entity cluster; and

storing the data entity cluster in at least one of the one or more electronic data stores.

View all claims

8 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques are disclosed for for prioritizing a plurality of clusters. Prioritizing clusters may generally include identifying a scoring strategy for prioritizing the plurality of clusters. Each cluster is generated from a seed and stores a collection of data retrieved using the seed. For each cluster, elements of the collection of data stored by the cluster are evaluated according to the scoring strategy and a score is assigned to the cluster based on the evaluation. The clusters may be ranked according to the respective scores assigned to the plurality of clusters. The collection of data stored by each cluster may include financial data evaluated by the scoring strategy for a risk of fraud. The score assigned to each cluster may correspond to an amount at risk.

Citations

20 Claims

1. A computer-implemented method comprising:
- by one or more hardware computer processors configured with specific computer executable instructions;
  
  accessing one or more electronic data stores, the one or more electronic data stores storing a plurality of data entities and respective data entity attributes;
  
  applying a clustering strategy to generate a data entity cluster by at least;
  
  designating a seed data entity, from the plurality of data entities, as the data entity cluster;
  
  accessing, based on the clustering strategy, one or more search protocols;
  
  performing first growth of the data entity cluster by executing at least a first of the one or more search protocols on the one or more electronic data stores to identify one or more data entities related to the seed data entity;
  
  adding the one or more data entities to the data entity cluster;
  
  performing second growth of the data entity cluster by executing at least a second of the one or more search protocols on the one or more electronic data stores to identify one or more additional data entities related to the one or more added data entities, the second search protocol different than the first search protocol; and
  
  adding the one or more additional data entities to the data entity cluster; and
  
  storing the data entity cluster in at least one of the one or more electronic data stores.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The computer-implemented method of claim 1, wherein executing at least the first of the one or more search protocols on the one or more electronic data stores to identify one or more data entities related to the seed data entity further comprises:
    - by the one or more hardware computer processors configured with specific computer executable instructions;
      
      identifying at least one data entity attribute associated with the seed data entity; and
      
      evaluating the plurality of data entities to determine the one or more data entities sharing the at least one data entity attribute with the seed data entity.
  - 3. The computer-implemented method of claim 2, wherein executing at least the first of the one or more search protocols on the one or more electronic data stores to identify one or more data entities related to the seed data entity further comprises:
    - by the one or more hardware computer processors configured with specific computer executable instructions;
      
      applying a filter to the at least one data entity attribute associated with the seed data entity, the filter selected based on the clustering strategy.
  - 4. The computer-implemented method of claim 1 further comprising:
    - by the one or more hardware computer processors configured with specific computer executable instructions;
      
      comparing data entities associated with the data entity cluster to data entities associated with a second data entity cluster; and
      
      in response to determining that at least one data entity associated with the data entity cluster shares an attribute with and/or is related to at least one data entity associated with the second data entity cluster, merging the data entity cluster and the second data entity cluster.
  - 5. The computer-implemented method of claim 1, wherein the first search protocol searches for data entities in a first electronic data store and the second search protocol searches for data entities in a second electronic data store.
  - 6. The computer-implemented method of claim 1, wherein the data entity cluster is iteratively generated by further:
    - by the one or more hardware computer processors configured with specific computer executable instructions;
      
      executing at least a third of the one or more search protocols on the one or more electronic data stores to identify yet one or more additional data entities related to the one or more additional data entities; and
      
      adding the yet one or more additional data entities to the data entity cluster.
  - 7. The computer-implemented method of claim 1 further comprising:
    - by the one or more hardware computer processors configured with specific computer executable instructions;
      
      causing a ranking score to be assigned to the data entity cluster; and
      
      ordering a listing of the data entity cluster and other data entity clusters relative to a one another.

8. A computer-implemented method of accessing one or more electronic data sources, the method comprising:
- by one or more hardware computer processors configured with specific computer executable instructions;
  
  accessing one or more electronic data stores, the one or more electronic data stores storing;
  
  a plurality of data entities and respective data entity attributes, anda plurality of data entity clusters; and
  
  causing access of a data entity cluster of the plurality of data entity clusters, wherein the data entity cluster is related to a clustering strategy, and wherein the data entity cluster has been iteratively generated by;
  
  designating a seed data entity, from the plurality of data entities, as the data entity cluster;
  
  accessing, based on the clustering strategy, one or more search protocols;
  
  performing first growth of the data entity cluster by executing at least a first of the one or more search protocols on the one or more electronic data stores to identify one or more data entities related to the seed data entity;
  
  adding the one or more data entities to the data entity cluster;
  
  performing second growth of the data entity cluster by executing at least a second of the one or more search protocols on the one or more electronic data stores to identify one or more additional data entities related to the one or more added data entities, the second search protocol different than the first search protocol; and
  
  adding the one or more additional data entities to the data entity cluster.
- View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 9. The computer-implemented method of claim 8, wherein executing at least the first of the one or more search protocols on the one or more electronic data stores to identify one or more data entities related to the seed data entity further comprises:
    - by the one or more hardware computer processors configured with specific computer executable instructions;
      
      identifying at least one data entity attribute associated with the seed data entity; and
      
      evaluating the plurality of data entities to determine the one or more data entities sharing the at least one data entity attribute with the seed data entity.
  - 10. The computer-implemented method of claim 9, wherein executing at least the first of the one or more search protocols on the one or more electronic data stores to identify one or more data entities related to the seed data entity further comprises:
    - by the one or more hardware computer processors configured with specific computer executable instructions;
      
      applying a filter to the at least one data entity attribute associated with the seed data entity, the filter selected based on the clustering strategy.
  - 11. The computer-implemented method of claim 8 further comprising:
    - by the one or more hardware computer processors configured with specific computer executable instructions;
      
      accessing, from the one or more electronic data stores, a scoring strategy for prioritizing the plurality of data entity clusters relative to one another;
      
      for each particular data entity cluster of the plurality of data entity clusters;
      
      evaluating, based on the scoring strategy, the particular data entity cluster; and
      
      assigning, based on the evaluation, a score to the particular data entity cluster; and
      
      ranking the plurality of data entity clusters according to the respective assigned scores.
  - 12. The computer-implemented method of claim 11, wherein the score assigned to each data entity cluster corresponds to an amount at risk.
  - 13. The computer-implemented method of claim 11, wherein assigning a score to the particular data entity cluster comprises:
    - by the one or more hardware computer processors configured with specific computer executable instructions;
      
      determining a plurality of base scores for the particular data entity cluster;
      
      determining, based on the plurality of base scores, an overall score for the particular data entity cluster; and
      
      assigning the overall score to the particular data entity cluster.
  - 14. The computer-implemented method of claim 11 further comprising:
    - by the one or more hardware computer processors configured with specific computer executable instructions;
      
      generating a user interface configured to be rendered on a computing device; and
      
      updating the user interface to include the listing of two or more of the plurality of data entity clusters according to the ranking.
  - 15. The computer-implemented method of claim 8, wherein the clustering strategy is associated with an investigation process.
  - 16. The computer-implemented method of claim 8 further comprising:
    - by the one or more hardware computer processors configured with specific computer executable instructions;
      
      generating a user interface configured to be rendered on a computing device.
  - 17. The computer-implemented method of claim 16 further comprising:
    - by the one or more hardware computer processors configured with specific computer executable instructions;
      
      receiving, via the user interface, a selection of at least one of;
      
      the seed data entity selected from the plurality of data entities, ora seed generation strategy by which the seed data entity is selected from the plurality of data entities.
  - 18. The computer-implemented method of claim 16 further comprising:
    - by the one or more hardware computer processors configured with specific computer executable instructions;
      
      receiving, via the user interface, a selection of the clustering strategy.
  - 19. The computer-implemented method of claim 16 further comprising:
    - by the one or more hardware computer processors configured with specific computer executable instructions;
      
      updating the user interface to include an indication of the data entity cluster; and
      
      receiving, via the user interface, a selection of the data entity cluster.
  - 20. The computer-implemented method of claim 8 further comprising:
    - by the one or more hardware computer processors configured with specific computer executable instructions;
      
      applying the clustering strategy to iteratively generate the data entity cluster.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Palantir Technologies Incorporated
Original Assignee
Palantir Technologies Incorporated
Inventors
Sprague, Matthew, Kross, Michael, Borochoff, Adam, Menon, Parvathy, Harris, Michael
Primary Examiner(s)
Hamilton, Lalita M

Application Number

US14/819,272
Publication Number

US 20160034470A1
Time in Patent Office

1,301 Days
Field of Search

705 35, 705 38
US Class Current
CPC Class Codes

G06F 16/23   Updating

G06F 16/244   Grouping and aggregation

G06F 16/24578   using ranking

G06F 16/2465   Query processing support fo...

G06F 16/26   Visual data mining; Browsin...

G06F 16/283   Multi-dimensional databases...

G06F 16/285   Clustering or classification

G06F 16/287   Visualization; Browsing

G06F 16/288   Entity relationship models

G06F 16/335   Filtering based on addition...

G06F 16/35   Clustering; Classification

G06F 16/355   Class or cluster creation o...

G06F 16/9535   Search customisation based ...

G06Q 10/10   Office automation; Time man...

G06Q 20/382   insuring higher security of...

G06Q 20/4016   involving fraud or risk lev...

G06Q 30/0185   Product, service or busines...

G06Q 40/00   Finance; Insurance; Tax str...

G06Q 40/02   Banking, e.g. interest calc...

G06Q 40/03   Credit; Loans; Processing t...

G06Q 40/10 : Tax strategies

G06Q 40/123 : Tax preparation or submission

H04L 63/145 : the attack involving the pr...

View All

Generating data clusters

First Claim

8 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Generating data clusters

First Claim

8 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links