System, method and computer program product for collusion detection

US 8,533,825 B1
Filed: 02/04/2010
Issued: 09/10/2013
Est. Priority Date: 02/04/2010
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for modeling collusion detection, comprising:

at a server computer in an enterprise computing environment;

receiving historical click data from a client computer connected to the enterprise computing environment over a network connection, wherein the historical click data comprises a plurality of clicks generated over a period of time and information associated with the plurality of clicks, and wherein the information comprises visitor Internet Protocol (IP) address information, date and time information, publisher information, referrer information, user-agent information, searched keywords, and cookie;

extracting entities of interest of one or more types from the historical click data;

formulating potential collusion among the entities as a network problem, a vector space problem, or a combination thereof;

wherein formulating potential collusion among the entities as a network problem comprises;

constructing network representations of the entities and their relationships, wherein the entities are represented by nodes and wherein their relationships are represented by connections between the nodes;

partitioning the network representations into subgroups of nodes to maximize a number of connections between the nodes in each subgroup;

forwarding the subgroups to a network analyzer for producing a first set of potentially colluding entities;

wherein formulating potential collusion among the entities as a vector space problem comprises;

constructing vector space representations of the entities, wherein the vector space representations comprise vectors representing click patterns of the entities;

grouping the vector space representations with similar anomalous patterns into clusters; and

forwarding the clusters to a pattern analyzer for producing a second set of potentially colluding entities; and

wherein formulating potential collusion among the entities as a combination of the network problem and the vector space problem comprises;

transforming the subgroups of nodes from the network problem into vector spaces; and

performing clustering on eigen vectors of the vector spaces to produce a third set of potentially colluding entities; and

identifying, from the first set of potentially colluding entities, the second set of potentially colluding entities, or the third set of potentially colluding entities, one or more groups of entities having a degree of collusion corresponding to an organized activity on the Internet.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Embodiments disclosed herein provide a practical solution for click fraud detection. One embodiment of a method may comprise constructing representations of entities via a graph network framework. The representations, graphs or vector spaces, may capture information pertaining to clicks by botnets/click farms. To detect click fraud, each representation may be analyzed in the context of clustering, resulting in large data sets with respect to time, frequency, or gap between clicks. Highly accurate and highly scalable heuristics may be developed/applied to identify IP addresses that indicate potential collusion. One embodiment of a system having a computer program product implementing such a click fraud detection method may operate to receive a client file containing clicks gathered at the client side, construct representations of entities utilizing the graph framework described herein, perform clustering on the representations thus constructed, identify IP addresses of interest, and return a list containing same to the client.

33 Citations

View as Search Results

21 Claims

1. A computer-implemented method for modeling collusion detection, comprising:
- at a server computer in an enterprise computing environment;
  
  receiving historical click data from a client computer connected to the enterprise computing environment over a network connection, wherein the historical click data comprises a plurality of clicks generated over a period of time and information associated with the plurality of clicks, and wherein the information comprises visitor Internet Protocol (IP) address information, date and time information, publisher information, referrer information, user-agent information, searched keywords, and cookie;
  
  extracting entities of interest of one or more types from the historical click data;
  
  formulating potential collusion among the entities as a network problem, a vector space problem, or a combination thereof;
  
  wherein formulating potential collusion among the entities as a network problem comprises;
  
  constructing network representations of the entities and their relationships, wherein the entities are represented by nodes and wherein their relationships are represented by connections between the nodes;
  
  partitioning the network representations into subgroups of nodes to maximize a number of connections between the nodes in each subgroup;
  
  forwarding the subgroups to a network analyzer for producing a first set of potentially colluding entities;
  
  wherein formulating potential collusion among the entities as a vector space problem comprises;
  
  constructing vector space representations of the entities, wherein the vector space representations comprise vectors representing click patterns of the entities;
  
  grouping the vector space representations with similar anomalous patterns into clusters; and
  
  forwarding the clusters to a pattern analyzer for producing a second set of potentially colluding entities; and
  
  wherein formulating potential collusion among the entities as a combination of the network problem and the vector space problem comprises;
  
  transforming the subgroups of nodes from the network problem into vector spaces; and
  
  performing clustering on eigen vectors of the vector spaces to produce a third set of potentially colluding entities; and
  
  identifying, from the first set of potentially colluding entities, the second set of potentially colluding entities, or the third set of potentially colluding entities, one or more groups of entities having a degree of collusion corresponding to an organized activity on the Internet.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method according to claim 1, further comprising solving the network problem utilizing a metric.
  - 3. The method according to claim 2, wherein the metric is utilized to determine a density of each subgroup, wherein the density of the subgroup is defined by the number of connections in the subgroup divided by the number of nodes in the subgroup, and wherein the density of the subgroup corresponds to a degree of collusion of the entities represented by the nodes in the subgroup.
  - 4. The method according to claim 2, wherein the metric is utilized to determine a total weight of the connections in each subgroup.
  - 5. The method according to claim 2, wherein the metric is utilized to determine a minimum cost associated with producing the subgroup.
  - 6. The method according to claim 2, wherein the metric is utilized to perform sparse cuts or minimum cuts on the network representations of the entities and their relationships.
  - 7. The method according to claim 1, further comprising solving the vector space problem utilizing a metric.
  - 8. The method according to claim 7, wherein the metric is utilized to minimize a maximum standard deviation, a variance, a radius, or a median of each of the clusters.
  - 9. The method according to claim 1, wherein the vectors represent the click patterns of the entities with respect to time, frequency, gaps between clicks, keywords, or a combination thereof.

10. A computer program product comprising one or more non-transitory computer-readable storage media storing computer instructions translatable by a processor in an enterprise computing environment to perform:
- receiving historical click data from a client computer connected to the enterprise computing environment over a network connection, wherein the historical click data comprises a plurality of clicks generated over a period of time and information associated with the plurality of clicks, and wherein the information comprises visitor Internet Protocol (IP) address information, date and time information, publisher information, referrer information, user-agent information, searched keywords, and cookie;
  
  extracting entities of interest of one or more types from the historical click data;
  
  formulating potential collusion among the entities as a network problem, a vector space problem, or a combination thereof;
  
  wherein formulating potential collusion among the entities as a network problem comprises;
  
  constructing network representations of the entities and their relationships, wherein the entities are represented by nodes and wherein their relationships are represented by connections between the nodes;
  
  partitioning the network representations into subgroups of nodes to maximize a number of connections between the nodes in each subgroup;
  
  forwarding the subgroups to a network analyzer for producing a first set of potentially colluding entities;
  
  wherein formulating potential collusion among the entities as a vector space problem comprises;
  
  constructing vector space representations of the entities, wherein the vector space representations comprise vectors representing click patterns of the entities;
  
  grouping the vector space representations with similar anomalous patterns into clusters; and
  
  forwarding the clusters to a pattern analyzer for producing a second set of potentially colluding entities; and
  
  wherein formulating potential collusion among the entities as a combination of the network problem and the vector space problem comprises;
  
  transforming the subgroups of nodes from the network problem into vector spaces; and
  
  performing clustering on eigen vectors of the vector spaces to produce a third set of potentially colluding entities; and
  
  identifying, from the first set of potentially colluding entities, the second set of potentially colluding entities, or the third set of potentially colluding entities, one or more groups of entities having a degree of collusion corresponding to an organized activity on the Internet.
- View Dependent Claims (11, 12, 13, 14, 15)
- - 11. The computer program product of claim 10, further comprising computer instructions translatable by the processor to solve the network problem utilizing a metric.
  - 12. The computer program product of claim 11, wherein the metric is utilized to:
    - determine a density of each subgroup, wherein the density of the subgroup is defined by the number of connections in the subgroup divided by the number of nodes in the subgroup, and wherein the density of the subgroup corresponds to a degree of collusion of the entities represented by the nodes in the subgroup;
      
      determine a total weight of the connections in each subgroup;
      
      determine a minimum cost associated with producing the subgroup;
      
      perform minimum cuts on the network representations of the entities and their relationship;
      
      orperform sparse cuts on the network representations of the entities and their relationships.
  - 13. The computer program product of claim 10, further comprising computer instructions translatable by the processor to solve the vector space problem utilizing a metric.
  - 14. The computer program product of claim 13, wherein the metric is utilized to minimize a maximum standard deviation, a variance, a radius, or a median of each of the clusters.
  - 15. The computer program product of claim 10, wherein the vectors represent the click patterns of the entities with respect to time, frequency, gaps between clicks, keywords, or a combination thereof.

16. A system for modeling collusion detection, comprising:
- a server computer; and
  
  one or more non-transitory computer-readable storage media accessible by the server computer and storing computer instructions translatable by a processor of the server computer to perform;
  
  receiving historical click data from a client computer communicatively connected to the server computer, wherein the historical click data comprises a plurality of clicks generated over a period of time and information associated with the plurality of clicks, and wherein the information comprises visitor Internet Protocol (IP) address information, date and time information, publisher information, referrer information, user-agent information, searched keywords, and cookie;
  
  extracting entities of interest of one or more types from the historical click data;
  
  formulating potential collusion among the entities as a network problem, a vector space problem, or a combination thereof;
  
  wherein formulating potential collusion among the entities as a network problem comprises;
  
  constructing network representations of the entities and their relationships, wherein the entities are represented by nodes and wherein their relationships are represented by connections between the nodes;
  
  partitioning the network representations into subgroups of nodes to maximize a number of connections between the nodes in each subgroup;
  
  forwarding the subgroups to a network analyzer for producing a first set of potentially colluding entities;
  
  wherein formulating potential collusion among the entities as a vector space problem comprises;
  
  constructing vector space representations of the entities, wherein the vector space representations comprise vectors representing click patterns of the entities;
  
  grouping the vector space representations with similar anomalous patterns into clusters; and
  
  forwarding the clusters to a pattern analyzer for producing a second set of potentially colluding entities; and
  
  wherein formulating potential collusion among the entities as a combination of the network problem and the vector space problem comprises;
  
  transforming the subgroups of nodes from the network problem into vector spaces; and
  
  performing clustering on eigen vectors of the vector spaces to produce a third set of potentially colluding entities; and
  
  identifying, from the first set of potentially colluding entities, the second set of potentially colluding entities, or the third set of potentially colluding entities, one or more groups of entities having a degree of collusion corresponding to an organized activity on the Internet.
- View Dependent Claims (17, 18, 19, 20, 21)
- - 17. The system of claim 16, wherein the network problem is solved utilizing a metric.
  - 18. The system of claim 17, wherein the metric is utilized to:
    - determine a density of each subgroup, wherein the density of the subgroup is defined by the number of connections in the subgroup divided by the number of nodes in the subgroup, and wherein the density of the subgroup corresponds to a degree of collusion of the entities represented by the nodes in the subgroup;
      
      determine a total weight of the connections in each subgroup;
      
      determine a minimum cost associated with producing the subgroup;
      
      perform minimum cuts on the network representations of the entities and their relationship;
      
      orperform sparse cuts on the network representations of the entities and their relationships.
  - 19. The system of claim 16, wherein the vector space problem is solved utilizing a metric.
  - 20. The system of claim 19, wherein the metric is utilized to minimize a maximum standard deviation, a variance, a radius, or a median of each of the clusters.
  - 21. The system of claim 16, wherein the vectors represent the click patterns of the entities with respect to time, frequency, gaps between clicks, keywords, or a combination thereof.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Adometry, Inc. (Alphabet Inc.)
Inventors
Marsa, Robert Lee, Doddi, Srinivas Rao
Primary Examiner(s)
Kim, Jung
Assistant Examiner(s)
TRAN, TRI MINH

Application Number

US12/700,053
Time in Patent Office

1,314 Days
Field of Search

726/22
US Class Current

726/22
CPC Class Codes

H04L 2463/144 Detection or countermeasure...

H04L 63/1425 Traffic logging, e.g. anoma...

System, method and computer program product for collusion detection

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

33 Citations

21 Claims

Specification

Use Cases

Quick Links

Others

System, method and computer program product for collusion detection

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

33 Citations

21 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others