System, method and computer program product for collusion detection
First Claim
1. A computer-implemented method for modeling collusion detection, comprising:
- at a server computer in an enterprise computing environment;
receiving historical click data from a client computer connected to the enterprise computing environment over a network connection, wherein the historical click data comprises a plurality of clicks generated over a period of time and information associated with the plurality of clicks, and wherein the information comprises visitor Internet Protocol (IP) address information, date and time information, publisher information, referrer information, user-agent information, searched keywords, and cookie;
extracting entities of interest of one or more types from the historical click data;
formulating potential collusion among the entities as a network problem, a vector space problem, or a combination thereof;
wherein formulating potential collusion among the entities as a network problem comprises;
constructing network representations of the entities and their relationships, wherein the entities are represented by nodes and wherein their relationships are represented by connections between the nodes;
partitioning the network representations into subgroups of nodes to maximize a number of connections between the nodes in each subgroup;
forwarding the subgroups to a network analyzer for producing a first set of potentially colluding entities;
wherein formulating potential collusion among the entities as a vector space problem comprises;
constructing vector space representations of the entities, wherein the vector space representations comprise vectors representing click patterns of the entities;
grouping the vector space representations with similar anomalous patterns into clusters; and
forwarding the clusters to a pattern analyzer for producing a second set of potentially colluding entities; and
wherein formulating potential collusion among the entities as a combination of the network problem and the vector space problem comprises;
transforming the subgroups of nodes from the network problem into vector spaces; and
performing clustering on eigen vectors of the vector spaces to produce a third set of potentially colluding entities; and
identifying, from the first set of potentially colluding entities, the second set of potentially colluding entities, or the third set of potentially colluding entities, one or more groups of entities having a degree of collusion corresponding to an organized activity on the Internet.
5 Assignments
0 Petitions
Accused Products
Abstract
Embodiments disclosed herein provide a practical solution for click fraud detection. One embodiment of a method may comprise constructing representations of entities via a graph network framework. The representations, graphs or vector spaces, may capture information pertaining to clicks by botnets/click farms. To detect click fraud, each representation may be analyzed in the context of clustering, resulting in large data sets with respect to time, frequency, or gap between clicks. Highly accurate and highly scalable heuristics may be developed/applied to identify IP addresses that indicate potential collusion. One embodiment of a system having a computer program product implementing such a click fraud detection method may operate to receive a client file containing clicks gathered at the client side, construct representations of entities utilizing the graph framework described herein, perform clustering on the representations thus constructed, identify IP addresses of interest, and return a list containing same to the client.
33 Citations
21 Claims
-
1. A computer-implemented method for modeling collusion detection, comprising:
at a server computer in an enterprise computing environment; receiving historical click data from a client computer connected to the enterprise computing environment over a network connection, wherein the historical click data comprises a plurality of clicks generated over a period of time and information associated with the plurality of clicks, and wherein the information comprises visitor Internet Protocol (IP) address information, date and time information, publisher information, referrer information, user-agent information, searched keywords, and cookie; extracting entities of interest of one or more types from the historical click data; formulating potential collusion among the entities as a network problem, a vector space problem, or a combination thereof; wherein formulating potential collusion among the entities as a network problem comprises; constructing network representations of the entities and their relationships, wherein the entities are represented by nodes and wherein their relationships are represented by connections between the nodes; partitioning the network representations into subgroups of nodes to maximize a number of connections between the nodes in each subgroup; forwarding the subgroups to a network analyzer for producing a first set of potentially colluding entities; wherein formulating potential collusion among the entities as a vector space problem comprises; constructing vector space representations of the entities, wherein the vector space representations comprise vectors representing click patterns of the entities; grouping the vector space representations with similar anomalous patterns into clusters; and forwarding the clusters to a pattern analyzer for producing a second set of potentially colluding entities; and wherein formulating potential collusion among the entities as a combination of the network problem and the vector space problem comprises; transforming the subgroups of nodes from the network problem into vector spaces; and performing clustering on eigen vectors of the vector spaces to produce a third set of potentially colluding entities; and identifying, from the first set of potentially colluding entities, the second set of potentially colluding entities, or the third set of potentially colluding entities, one or more groups of entities having a degree of collusion corresponding to an organized activity on the Internet. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
10. A computer program product comprising one or more non-transitory computer-readable storage media storing computer instructions translatable by a processor in an enterprise computing environment to perform:
-
receiving historical click data from a client computer connected to the enterprise computing environment over a network connection, wherein the historical click data comprises a plurality of clicks generated over a period of time and information associated with the plurality of clicks, and wherein the information comprises visitor Internet Protocol (IP) address information, date and time information, publisher information, referrer information, user-agent information, searched keywords, and cookie; extracting entities of interest of one or more types from the historical click data; formulating potential collusion among the entities as a network problem, a vector space problem, or a combination thereof; wherein formulating potential collusion among the entities as a network problem comprises; constructing network representations of the entities and their relationships, wherein the entities are represented by nodes and wherein their relationships are represented by connections between the nodes; partitioning the network representations into subgroups of nodes to maximize a number of connections between the nodes in each subgroup; forwarding the subgroups to a network analyzer for producing a first set of potentially colluding entities; wherein formulating potential collusion among the entities as a vector space problem comprises; constructing vector space representations of the entities, wherein the vector space representations comprise vectors representing click patterns of the entities; grouping the vector space representations with similar anomalous patterns into clusters; and forwarding the clusters to a pattern analyzer for producing a second set of potentially colluding entities; and wherein formulating potential collusion among the entities as a combination of the network problem and the vector space problem comprises; transforming the subgroups of nodes from the network problem into vector spaces; and performing clustering on eigen vectors of the vector spaces to produce a third set of potentially colluding entities; and identifying, from the first set of potentially colluding entities, the second set of potentially colluding entities, or the third set of potentially colluding entities, one or more groups of entities having a degree of collusion corresponding to an organized activity on the Internet. - View Dependent Claims (11, 12, 13, 14, 15)
-
-
16. A system for modeling collusion detection, comprising:
-
a server computer; and one or more non-transitory computer-readable storage media accessible by the server computer and storing computer instructions translatable by a processor of the server computer to perform; receiving historical click data from a client computer communicatively connected to the server computer, wherein the historical click data comprises a plurality of clicks generated over a period of time and information associated with the plurality of clicks, and wherein the information comprises visitor Internet Protocol (IP) address information, date and time information, publisher information, referrer information, user-agent information, searched keywords, and cookie; extracting entities of interest of one or more types from the historical click data; formulating potential collusion among the entities as a network problem, a vector space problem, or a combination thereof; wherein formulating potential collusion among the entities as a network problem comprises; constructing network representations of the entities and their relationships, wherein the entities are represented by nodes and wherein their relationships are represented by connections between the nodes; partitioning the network representations into subgroups of nodes to maximize a number of connections between the nodes in each subgroup; forwarding the subgroups to a network analyzer for producing a first set of potentially colluding entities; wherein formulating potential collusion among the entities as a vector space problem comprises; constructing vector space representations of the entities, wherein the vector space representations comprise vectors representing click patterns of the entities; grouping the vector space representations with similar anomalous patterns into clusters; and forwarding the clusters to a pattern analyzer for producing a second set of potentially colluding entities; and wherein formulating potential collusion among the entities as a combination of the network problem and the vector space problem comprises; transforming the subgroups of nodes from the network problem into vector spaces; and performing clustering on eigen vectors of the vector spaces to produce a third set of potentially colluding entities; and identifying, from the first set of potentially colluding entities, the second set of potentially colluding entities, or the third set of potentially colluding entities, one or more groups of entities having a degree of collusion corresponding to an organized activity on the Internet. - View Dependent Claims (17, 18, 19, 20, 21)
-
Specification