Fast component enumeration in graphs with implicit edges

0Associated
Cases 
0Associated
Defendants 
0Accused
Products 
0Forward
Citations 
0
Petitions 
1
Assignment
First Claim
1. A method for enumerating a graph, the method comprising:
 creating a graph for a set of data records, each data record represented as a vertex in the graph, each data record comprising one or more data elements; and
pointing each vertex in the graph in a database to a corresponding root vertex based on the one or more data elements of the data record represented by the vertex, wherein pointing each vertex in the graph to a corresponding root vertex comprises creating a key for each unique value of each data element represented by the vertices in the graph, each key representing an implicit edge in the graph, wherein two vertices sharing a key implicitly share an edge.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and system for graphical enumeration. The method includes creating an ordered set of vertices for a graph such that each vertex is associated with a corresponding index, and wherein each vertex in the ordered set of vertices includes information. A plurality of keys is created for defining the information. A plurality of lists of vertices is created, each of which is associated with a corresponding key such that vertices in a corresponding list include information associated with the corresponding key. For a first list of vertices, a least valued index is determined from a group of associated vertices based on vertices in the first list and vertices pointed to by the vertices in the first list. Also, all associated vertices are pointed to a root vertex associated with the least valued index.
52 Citations
No References
Graph querying, graph motif mining and the discovery of clusters  
Patent #
US 7,933,915 B2
Filed 02/27/2007

Current Assignee
Regents of the University of California

Sponsoring Entity
Regents of the University of California

GRAPH QUERYING, GRAPH MOTIF MINING AND THE DISCOVERY OF CLUSTERS  
Patent #
US 20110173189A1
Filed 03/28/2011

Current Assignee
Regents of the University of California

Sponsoring Entity
Regents of the University of California

Techniques and Workflows for Computer Graphics Animation System  
Patent #
US 20100214313A1
Filed 06/29/2009

Current Assignee
DIGITALFISH INC.

Sponsoring Entity
Mark J. Oftedal, Daniel Lawrence Herman

Spatial recognition and grouping of text and graphics  
Patent #
US 7,729,538 B2
Filed 08/26/2004

Current Assignee
Microsoft Technology Licensing LLC

Sponsoring Entity
Microsoft Corporation

Software Platform and System for Grid Computing  
Patent #
US 20100281166A1
Filed 11/07/2008

Current Assignee
Manjrasoft Pty. Ltd.

Sponsoring Entity
Manjrasoft Pty. Ltd.

POINT OF REFERENCE DIRECTIONS  
Patent #
US 20090043489A1
Filed 01/17/2008

Current Assignee
David P. Weidner

Sponsoring Entity
David P. Weidner

FAST COMPUTATION OF COMPACT POSET ISOMORPHISM CERTIFICATES  
Patent #
US 20090216820A1
Filed 05/11/2007

Current Assignee
GEISTIGES EIGENTUM INC.

Sponsoring Entity
GEISTIGES EIGENTUM INC.

Language modelling system and a fast parsing method  
Patent #
US 7,603,651 B2
Filed 05/28/2003

Current Assignee
Filip D.E. Brabander

Sponsoring Entity
Filip D.E. Brabander

Implementation of memory access control using optimization  
Patent #
US 7,605,816 B2
Filed 07/28/2006

Current Assignee
Microsoft Technology Licensing LLC

Sponsoring Entity
Microsoft Corporation

Method and program for converting boundary data into cell inner shape data  
Patent #
US 7,321,366 B2
Filed 02/27/2003

Current Assignee
Riken

Sponsoring Entity
Riken

High performance wireless networks using distributed control  
Patent #
US 7,420,952 B2
Filed 05/08/2003

Current Assignee
Dynamic MESH Networks Incorporated

Sponsoring Entity
Mesh Dynamics Incorporated

GRAPHBASED MODELING APPARATUS AND TECHNIQUES  
Patent #
US 20080300834A1
Filed 06/01/2007

Current Assignee
WSOU Investments LLC

Sponsoring Entity
AlcatelLucent SA

Software migration  
Patent #
US 20070011669A1
Filed 07/06/2005

Current Assignee
International Business Machines Corporation

Sponsoring Entity
International Business Machines Corporation

Method of progressively coding/decoding 3D mesh information and apparatus thereof  
Patent #
US 7,224,729 B1
Filed 09/15/1999

Current Assignee
Samsung Electronics Co. Ltd.

Sponsoring Entity
Samsung Electronics Co. Ltd.

GRAPHICAL RULE BASED MODELING OF BIOCHEMICAL NETWORKS  
Patent #
US 20070212719A1
Filed 03/12/2007

Current Assignee
Los Alamos National Security LLC

Sponsoring Entity
Los Alamos National Security LLC

Multidirectional and autoadaptive relevance and search system and methods thereof  
Patent #
US 20070250500A1
Filed 12/05/2006

Current Assignee
Collarity Incorporated

Sponsoring Entity
Collarity Incorporated

Graph querying, graph motif mining and the discovery of clusters  
Patent #
US 20070239694A1
Filed 02/27/2007

Current Assignee
Regents of the University of California

Sponsoring Entity
Regents of the University of California

Method and apparatus for determining communication path over network by using spanning tree and circuit detection  
Patent #
US 7,301,912 B2
Filed 10/30/2003

Current Assignee
Electronics and Telecommunications Research Institute

Sponsoring Entity
Electronics and Telecommunications Research Institute

METHOD AND SYSTEM FOR ANALYSIS AND VISUALIZATION OF A WIRELESS COMMUNICATIONS NETWORK  
Patent #
US 20070298805A1
Filed 06/27/2006

Current Assignee
Motorola Solutions Inc.

Sponsoring Entity
Motorola Inc.

Database system providing methodology for property enforcement  
Patent #
US 6,801,905 B2
Filed 10/30/2002

Current Assignee
SAP SE

Sponsoring Entity
Sybase Incorporated

Efficient method for storing multicast trees  
Patent #
US 6,633,579 B1
Filed 10/21/1998

Current Assignee
Ericsson AB

Sponsoring Entity
Marconi Communications Inc.

Selection of partial scan flipflops to break feedback cycles  
Patent #
US 5,502,646 A
Filed 12/02/1993

Current Assignee
NEC Corporation

Sponsoring Entity
NEC USA

Method and system for classifying semistructured documents  
Patent #
US 6,606,620 B1
Filed 07/24/2000

Current Assignee
Google LLC

Sponsoring Entity
International Business Machines Corporation

System and method for visualizing massive multidigraphs  
Patent #
US 6,781,599 B2
Filed 01/04/2001

Current Assignee
ATT Inc.

Sponsoring Entity
ATT Inc.

Mtree an Xpath multiaxis structure threaded index  
Patent #
US 20060064432A1
Filed 09/22/2005

Current Assignee
Primo M. Pettovello

Sponsoring Entity
Primo M. Pettovello

Spatial recognition and grouping of text and graphics  
Patent #
US 20060045337A1
Filed 08/26/2004

Current Assignee
Microsoft Technology Licensing LLC

Sponsoring Entity
Microsoft Corporation

Method and system for sourcecode modelbased testing  
Patent #
US 20060075305A1
Filed 10/01/2004

Current Assignee
Microsoft Technology Licensing LLC

Sponsoring Entity
Microsoft Technology Licensing LLC

Integrated realtime feature based costing  
Patent #
US 7,065,420 B1
Filed 11/19/2004

Current Assignee
Board of Trustees of the University of Illinois

Sponsoring Entity
Board of Trustees of the University of Illinois

Dataflow method for optimizing exceptionhandling instructions in programs  
Patent #
US 7,120,904 B1
Filed 04/19/2000

Current Assignee
Micron Technology Inc.

Sponsoring Entity
Intel Corporation

Implementation of memory access control using optimization  
Patent #
US 20060265557A1
Filed 07/28/2006

Current Assignee
Microsoft Technology Licensing LLC

Sponsoring Entity
Microsoft Corporation

Method and apparatus for sociological data mining  
Patent #
US 7,143,091 B2
Filed 02/04/2003

Current Assignee
Adobe Inc.

Sponsoring Entity
CATAPHORN INC.

Implementation of memory access control using optimizations  
Patent #
US 7,139,892 B2
Filed 06/30/2003

Current Assignee
Microsoft Technology Licensing LLC

Sponsoring Entity
Microsoft Corporation

Method and apparatus for sociological data mining  
Patent #
US 20060253418A1
Filed 04/07/2006

Current Assignee
Sunrise Series 54 of Allied Security Trust I

Sponsoring Entity
Sunrise Series 54 of Allied Security Trust I

Method and program for converting boundary data into cell inner shape data  
Patent #
US 20050216238A1
Filed 02/27/2003

Current Assignee
Riken

Sponsoring Entity
Riken

Method and system for analyzing test coverage  
Patent #
US 6,691,079 B1
Filed 05/28/1999

Current Assignee
Synopsys Incorporated

Sponsoring Entity
SpringSoft Inc.

Unified system and method for downloading code to heterogeneous devices in distributed storage area networks  
Patent #
US 20040030768A1
Filed 05/23/2003

Current Assignee
HewlettPackard Development Company L.P.

Sponsoring Entity
HewlettPackard Development Company L.P.

Dataflow method for optimizing exceptionhandling instructions in programs  
Patent #
US 20040243982A1
Filed 10/24/2003

Current Assignee
Arch D. Robison

Sponsoring Entity
Arch D. Robison

3D mesh coding/decoding method  
Patent #
US 6,668,091 B1
Filed 06/24/1999

Current Assignee
Samsung Electronics Co. Ltd.

Sponsoring Entity
Samsung Electronics Co. Ltd.

Speech recognition with mixtures of bayesian networks  
Patent #
US 6,336,108 B1
Filed 12/23/1998

Current Assignee
Microsoft Technology Licensing LLC

Sponsoring Entity
Microsoft Corporation

Methods and apparatus for the efficient compression of nonmanifold polygonal meshes  
Patent #
US 6,452,596 B1
Filed 04/22/1999

Current Assignee
International Business Machines Corporation

Sponsoring Entity
International Business Machines Corporation

Compressed representation of changing meshes and method to decompress  
Patent #
US 6,184,897 B1
Filed 01/14/1998

Current Assignee
International Business Machines Corporation

Sponsoring Entity
International Business Machines Corporation

Belief networks with decision graphs  
Patent #
US 6,154,736 A
Filed 07/30/1997

Current Assignee
Microsoft Technology Licensing LLC

Sponsoring Entity
Microsoft Corporation

Compression of simple geometric models using spanning trees  
Patent #
US 5,825,369 A
Filed 07/30/1996

Current Assignee
International Business Machines Corporation

Sponsoring Entity
International Business Machines Corporation

Geometric modeling apparatus  
Patent #
US 5,265,197 A
Filed 12/21/1989

Current Assignee
Toshiba Corporation

Sponsoring Entity
Toshiba Corporation

Validation framework for service oriented architecture (SOA) application adoption  
Patent #
US 8,321,841 B2
Filed 01/08/2008

Current Assignee
International Business Machines Corporation

Sponsoring Entity
International Business Machines Corporation

Graph querying, graph motif mining and the discovery of clusters  
Patent #
US 8,396,884 B2
Filed 03/28/2011

Current Assignee
Regents of the University of California

Sponsoring Entity
Regents of the University of California

Fast computation of compact poset isomorphism certificates using position weights  
Patent #
US 8,429,108 B2
Filed 05/11/2007

Current Assignee
Geistiges Eigentum Inc.

Sponsoring Entity
GEISTIGES EIGENTUM INC.

System and method for fast component enumeration in graphs with implicit edges  
Patent #
US 8,462,161 B1
Filed 02/06/2009

Current Assignee
Kount Incorporated

Sponsoring Entity
Kount Incorporated

BEHAVIORAL RULES DISCOVERY FOR INTELLIGENT COMPUTING ENVIRONMENT ADMINISTRATION  
Patent #
US 20140279797A1
Filed 03/12/2013

Current Assignee
BladeLogic Incorporated

Sponsoring Entity
BMC Software Incorporated

Fast component enumeration in graphs with implicit edges  
Patent #
US 9,075,896 B2
Filed 05/30/2013

Current Assignee
Kount Incorporated

Sponsoring Entity
Kount Incorporated

DEVICES AND/OR METHODS TO PROVIDE A QUERY RESPONSE BASED ON EPHEMERAL DATA  
Patent #
US 20160378831A1
Filed 06/26/2015

Current Assignee
Intel Corporation

Sponsoring Entity
Gabriel G. InfanteLopez

Combining merkle trees in graph databases  
Patent #
US 10,242,065 B1
Filed 06/30/2016

Current Assignee
Emc IP Holding Company LLC

Sponsoring Entity
Emc IP Holding Company LLC

20 Claims
 1. A method for enumerating a graph, the method comprising:
creating a graph for a set of data records, each data record represented as a vertex in the graph, each data record comprising one or more data elements; and pointing each vertex in the graph in a database to a corresponding root vertex based on the one or more data elements of the data record represented by the vertex, wherein pointing each vertex in the graph to a corresponding root vertex comprises creating a key for each unique value of each data element represented by the vertices in the graph, each key representing an implicit edge in the graph, wherein two vertices sharing a key implicitly share an edge.  View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
 11. A nontransitory computerreadable storage medium storing executable computer program instructions, the computer program instructions comprising instructions for:
creating a graph for a set of data records, each data record represented as a vertex in the graph, each data record comprising one or more data elements; and pointing each vertex in the graph in a database to a corresponding root vertex based on the one or more data elements of the data record represented by the vertex, wherein pointing each vertex in the graph to a corresponding root vertex comprises creating a key for each unique value of each data element represented by the vertices in the graph, each key representing an implicit edge in the graph, wherein two vertices sharing a key implicitly share an edge.  View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
1 Specification
This application is a continuation of U.S. application Ser. No. 14/728,499 filed on Jun. 2, 2015, which is a continuation of U.S. application Ser. No. 13/905,952, filed on May 30, 2013, which is a continuation of U.S. application Ser. No. 12/367,180, filed on Feb. 6, 2009, which claims the benefit of U.S. Provisional Application No. 61/145,921, filed on Jan. 20, 2009, each of which is incorporated by reference in its entirety.
The present invention pertains to the field of data storage. Specifically, the present invention provides for the enumeration of components in a graph without explicitly defining the edges in the graph.
A graph is a collection of “vertices” (points or nodes) and “edges” (lines connecting points). The graph can be representative of any set of data, such as those related to travel, biological samples, and chip design, to name a few. Points in the graph represent an individual collection of data, and edges between two points can represent data that is shared between the two points. For instance, in the travel industry a graph may represent a grid of airline flights between numerous cities regardless of which airline is used. Each node in the graph can represent a city to which a flight is possibly directed. In one case, connected points form an edge and are related in that those two points share the same flight. As another edge, the connected points may represent a flight between two cities for a particular airline.
In conventional techniques, a graph is typically represented in memory as a list of all pairs of vertices that share an edge. In addition, a “connected component” of a graph is any subset of vertices all connected by some sequence of edges. Enumerating the connected components of a graph is a problem in classical computer science. Traditional methods include Kosaraju'"'"'s algorithm, Tarjan'"'"'s algorithm, and Gabow'"'"'s algorithm.
However, for each of these techniques enumerating connected components, execution time and space in memory are proportional to the total number of vertices and edges, or O(V+E). In more simplistic terms, the entire graph and all the edges in the graphs need to be evaluated in order to enumerate the connected components. While this may seem like a straightforward technique, as the number of points in the graph increases, the time to enumerate the graph also increases. For graphs that include points that are heavily connected, the execution time may increase to the square of the number of points in the graph. As such, for large amounts of data, traditional techniques for component enumeration fall short of providing realtime analysis of the graphical data.
What is needed is an invention that provides a faster and more efficient way to enumerate graphs. What is described in the present invention is a method and system for enumerating graphs, and in particular for enumerating components of a graph for purposes of associating vertices in the graph to provide data analysis.
A method for graphical enumeration is described, in accordance with one embodiment of the present invention. The method is used to process information related to any type of data, such as customer transactions. The information can be represented as a graph. The method includes creating an ordered set of vertices for a graph such that each vertex is associated with a corresponding index. Each vertex in the ordered set of vertices includes a subset of the information, such as one customer transaction. A plurality of keys is created that define the information. Each key is associated with a unique piece of information. A plurality of lists of vertices is created, each of which is associated with a corresponding key, such that vertices in a corresponding list include information associated with the corresponding key. For a first list of vertices, a least valued index is determined from a group of associated vertices based on vertices in the first list and vertices pointed to by the vertices in the first list. Also, all associated vertices are pointed to a root vertex associated with the least valued index.
In another embodiment, a system for performing graphical enumeration is described. The system can be implemented in conjunction with a communication network that is coupled to a plurality of information sources. For instance, the system is used to perform graphical enumeration on customer transactions that are associated with the plurality of information sources. The system includes a receiver for receiving information related to at least one consumer transaction from at least one computing resource at a corresponding source. For instance, the corresponding source may be a merchant participating in the transaction, a credit card processing company, a consumer initiating the transaction, or the like. A graph definer is included in the system for creating an ordered set of vertices for the graph, such that each vertex is associated with a corresponding index. The ordered set of vertices includes the information that is received. Storage is included in the system for storing the ordered set of vertices. In addition, the system includes a key creator for creating a plurality of keys defining the information. A list creator creates a plurality of lists of vertices by accessing the ordered set of vertices that is stored. Each of the lists of vertices is associated with a corresponding key, such that a vertex in a corresponding list includes information associated with the corresponding key. The system also includes a component generator for enumerating the graph. The component generator determines a least valued index from a group of associated vertices based on vertices in the first list and vertices pointed to by the vertices in the first list. Also, the component generator points all associated vertices to a root vertex associated with the least valued index.
Exemplary embodiments are illustrated in referenced figures of the drawings which illustrate what is regarded as the preferred embodiments presently contemplated. It is intended that the embodiments and figures disclosed herein are to be considered illustrative rather than limiting.
Reference will now be made in detail to the preferred embodiments of the present invention, a method and system for enumerating components in a graph. While the invention will be described in conjunction with the preferred embodiments, it will be understood that they are not intended to limit the invention to these embodiments. On the contrary, the invention is intended to cover alternatives, modifications and equivalents, which may be included within the spirit and scope of the invention as defined by the appended claims.
Accordingly, embodiments of the present invention are capable of providing a faster and more efficient way to enumerate components of a graph in order to find associations between vertices. In particular, the present invention is capable of avoiding edge analysis when enumerating a graph through the creation of keys and making other various associations. As such, embodiments of the present invention need not perform an explicit analysis of each edge in a graph when enumerating components of a graph.
Notation and Nomenclature
Embodiments of the present invention can be implemented on a software program or dedicated hardware for processing data through a computer system. The computer system can be a personal computer, notebook computer, server computer, mainframe, networked computer (e.g., router), handheld computer, personal digital assistant, workstation, and the like. This program or its corresponding hardware implementation is operable for fast enumeration of components of a graph without explicit edge analysis. In one embodiment, the computer system includes a processor coupled to a bus and memory storage coupled to the bus. The memory storage can be volatile or nonvolatile and can include removable storage media. The computer can also include a display, provision for data input and output, etc.
Some portion of the detailed descriptions that follow are presented in terms of procedures, steps, logic block, processing, and other symbolic representations of operations on data bits that can be performed on computer memory. These descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. A procedure, computer executed step, logic block, process, etc. is here, and generally, conceived to be a selfconsistent sequence of operations or instructions leading to a desired result. The operations are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated in a computer system. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers or the like.
It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussions, it is appreciated that throughout the present invention, discussions utilizing terms such as “determining,” “creating,” “defining,” or the like refer to the actions and processes of a computer system, or similar electronic computing device, including an embedded system, that manipulates and transfers data represented as physical (electronic) quantities within the computer system'"'"'s registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.
Graph Analysis
Graph theory is used to represent data of various types. The data is represented as objects in a graph, where “vertices” represent the data objects and “edges” are links that connect pairs of vertices. As such, a graph is a collection of vertices (nodes or points) and edges (lines connecting two points). Points in the graph represent an individual collection of data, and edges between two points can represent data that is shared between the two points.
Conventionally, a graph can be represented in memory as a list of all pairs of vertices that share an edge. In addition, the graph can be represented by vertices that do not share an edge with another vertex. In addition, a “connected component” of a graph is any subset of vertices, each of which is connected to one or more vertices in the connected component by some sequence of edges. Embodiments of the present invention are able to perform enumeration of components of a graph without explicitly identifying edges within the graph.
A graph can represent a variety of data. As examples, graph theory is employed to represent data objects in transportation, Internet structure, communication traffic networks, airline travel networks, computer chip design, physics, biology, etc. For instance, a graph can represent employees in a large international company and identify specifically where an employee works, at what position, earning what salary, etc. As another example, graph theory can be employed to represent structural properties of an air transportation network. The graph may represent a grid of airline flights between numerous airports, and include information such as the locations of airports, specific flights between airports, associated airlines, flight times, etc. Also, graph theory is employed to represent a molecular structure, or to represent threedimensional atomic structures of an atom.
One particular implementation of embodiments of the present invention is used to model behavior, and more specifically is used to perform risk analysis of consumer behavior within the context of making retail purchases. In one instance, risk analysis of a graph is used to prevent fraud. For instance, transaction orders that use the same credit card number, email address, hypertext transfer protocol (HTTP) cookie, machine fingerprint, Internet protocol (IP) address, or any of a number of factors, may be related. In the case of fraud analysis, an online retail customer with one email address and twenty credit card numbers may be in possession and using a list of stolen credit cards.
Fast Component Enumeration of Graphs
Conventionally a graph is represented in memory as either a list of all pairs of vertices that share an edge. Suppose instead of explicitly defining edges, a graph of embodiments of the present invention is represented as a list of single vertices, each associated with a list of “keys”. Also suppose that there is no explicit list of edges, but implicitly embodiments of the present invention make the implication that two vertices share an edge if they share a common key value. For this special case, component enumeration of embodiments of the present invention requires an execution time that is at worst represented by O(V·log V) and requires a memory space approximating O(V), where O is the execution time and V is the number of vertices. In practice this represents a substantial savings over the previously described execution time of traditional methods O(V+E), since in graphs with heavily connected components O(V+E) approaches O(V^{2}).
At 110, an ordered set of vertices for a graph is created. Each vertex in the graph is associated with a corresponding index. For instance, the index may follow canonical form, such as a numbering system. In one example, each vertex represents a consumer transaction and is ordered with some relation to time, such as when the transaction occurred, when the transaction was received, when the transaction completed processing, etc. Further, each vertex in the ordered set of vertices includes information that defines that transaction. The information may be unique to a vertex, or may be shared by one or more vertices. For instance, in a retail environment, a single credit card may be used in numerous transactions. As such, vertices relating to those transactions are linked in the graph through the credit card.
At 120, a plurality of keys is created that define the information. In particular, each key defines a unique piece of information. For instance, in the retail environment, information related to consumer transactions may include credit card information, HTTP cookies associated with the computing resource used to complete the transaction from the buyer'"'"'s side, IP address of the computing resource used to access the Internet, email address of the buyer, etc. Each of these pieces of information is associated with a different key. The total number of keys is constantly changing as new information is received, and less useful information expires. For instance, each credit card number used in a transaction is unique and is associated with a unique key.
At 130, a plurality of lists of vertices is created. More particularly, for each key, a corresponding list of vertices is created, such that vertices in the corresponding list include information that is associated with the corresponding key. For example, a credit card may be used in numerous transactions. Each of those transactions is represented by a different vertex in the graph representing all known consumer transactions. For the key associated with the credit card, a list is created of vertices that include the same credit card. That is, the list includes vertices associated with transactions that have used the same credit card.
At 140, for a first list of vertices, a least valued index is determined from a group of associated vertices. The group of associated vertices is based on vertices in the first list, and also vertices pointed to by the vertices in the first list. In one case, the group includes the vertices in the first list, and also vertices pointed to by the vertices in the first list. More particularly, as the process in flow diagram 100 is performed, each vertex in the graph will point downhill to another vertex, within the context of the ordered set of vertices. Root vertices do not point downhill, but form the endpoint of link between vertices. Eventually, each vertex will point downhill to a root vertex in a corresponding component of the graph. The least valued index is associated with a vertex in the graph.
In addition, the group of associated vertices includes a vertex or chain of vertices that is pointed to by a list vertex that is associated with the least valued index from vertices in the first list and vertices pointed to by the vertices in the first list. That is, an additional check is made to determine if the list vertex points to another vertex, and so on in a chain of pointed to vertices. If so, the least valued index is adjusted to the lowest index in the chain of pointed to vertices.
At 150, all associated vertices are pointed to a root vertex associated with the least valued index. This pointing operation links the associated vertices to other vertices in the graph. More particularly, each of the associated vertices is updated, such that they all point to the most current root vertex. Since they point to their most current root vertex, as the operations in 140 and 150 are repeated for each of the plurality of lists of vertices, vertices in the graph will continually update their corresponding pointed to vertex. In particular, for a second list of vertices, a least valued index is determined from a group of associated vertices. The group of associated vertices includes vertices in the second list, and also vertices pointed to by the vertices in the second list. Also, all of these associated vertices are pointed to a second root vertex that is most current, associated with the least valued index. Eventually each vertex in the ordered set of vertices of the graph will point to a root vertex that is the true root of a component of the graph.
In one application, as previously described component enumeration of a graph is performed as part of a risk analysis of consumer behavior, such as retail purchases. Orders that use the same credit card number, email address, http cookie, machine fingerprint, IP address, or other factor, may be related. In practice, it can be useful to assemble groups of such orders for further analysis, such as when performing fraud analysis. Embodiments of the present invention are described within the context of risk analysis of consumer behavior, for illustration purposes. However, other embodiments are well suited to component enumeration of graphs representing any type of data for purposes of any type of data analysis. That is, methods and systems of the present invention are well suited to performing component enumeration on any graph representing any type of data in a quick and efficient manner.
As shown in
Each of the information sources (e.g., 210AN) provides information related to a consumer transaction or order, such as those making retail purchases. For instance, a consumer transaction may involve a buyer and a merchant (e.g., seller). The transaction between the buyer and merchant may occur over the Internet as a form of electronic commerce (ecommerce), or may be implemented through more traditional means, such as through a personto person transaction at a brickandmortar merchant. Information related to the consumer transaction is collected at a corresponding information source. The source may be associated with either the buyer, the merchant, or a third party service. As an example, in an ecommerce setting, the buyer'"'"'s computer resource may collect the information related to the consumer transaction and relay that information to the graph enumerator 300. Also, in either the ecommerce setting or a more traditional market setting, a merchant may collect the information related to one or more transactions and send them to the graph enumerator 300. Further, in either the ecommerce setting or the more traditional market setting, a third party service, such as a credit card company or credit card processing company, may collect the information and send it to the graph enumerator 300 for further analysis.
For example, the information relating to a current transaction may be linked to other consumer transactions. As described previously, as the information is incorporated into a graph representing a plurality of consumer transactions, graph enumeration determines whether the current transaction may be linked to other consumer transactions. The relationship of the current transaction to other previous transactions is useful in performing fraud analysis, as an example. As such, the current transaction may be halted if fraud is detected, or may be authorized to complete the credit card transaction, if no fraud is detected. In addition, future transactions involving the same information (e.g., credit card, machine ID, email address, etc.) may be halted if fraud is detected.
The graph enumerator 300 of
In general, the graph enumerator 300 includes an optional receiver 310, graph definer 320, key creator 330, list creator 340, component generator 350, data storage 225, and optional transaction analyzer 360. In one implementation, receiver 310 receives information related to at least one consumer transaction from at least one source. In another implementation, receiver 310 is an input mechanism for receiving information into system 300. Graph definer 320 creates an ordered set of vertices for a graph such that each vertex is associated with a corresponding index. The ordered set of vertices includes the information related to at least one consumer transaction. Key creator 330 creates a plurality of keys defining the information. List creator 340 creates a plurality of lists of vertices, each of which is associated with a corresponding key such that vertices in a corresponding list includes information associated with the corresponding key. Component generator 350 determines a least valued index from a group of associated vertices based on vertices in a first list of vertices, and vertices pointed to by the vertices in said first list. Component generator 350 also points all associated vertices to a most current root vertex that is associated with the least valued index. Data storage 225 may be incorporated within graph enumerator 300, or located remotely from graph enumerator 300, and is used for storing the ordered set of vertices, and storing a root index associated with a most current root vertex in corresponding entries of the ordered set of vertices. Also, storage 225 is capable of storing, in relation to a component, vertices that are associated with the component, such that vertices that point to the same root vertex are associated with a corresponding component of the graph. The functions performed by graph enumerator 300 are described in more detail in association with
Turning now to
At 410, optionally, information is received for processing. For instance, the information is received by receiver 310 of graph enumerator 300 from one or more sources 210AN. More specifically, in one implementation the information is received from a source (e.g., merchant machine, buyer machine, third party machine, such as a credit card processing company, etc.). The information may be received over the Internet, or through some communication network, so that the information may be analyzed in relation to previous consumer transactions, with results returned to a requestor in a timely fashion. In another instance, the information is received through other means, such that receiver 310 acts as an input mechanism. Still other means for receiving data is supported. While the present embodiment is described within the context of receiving information related to consumer transactions, the method of flow diagram 400 is well suited to component enumeration of any graph representing any type of data that is received for analysis.
At 420, an ordered set of vertices for a graph is created. For instance, the graph definer 320 is capable of creating the ordered set of vertices. Each vertex in the graph is associated with a corresponding index. The operation outlined in 420 is analogous to the operation outlined in 110, and the description of 110 previously provided is equally suited to the operation of 420. In particular, each vertex includes information, such as those related to consumer transactions (e.g., credit card number, IP address, etc.).
At 430, a plurality of keys is created that define the information. For instance, the key creator 330 is capable of creating the keys. The operation outlined in 430 is analogous to the operation outlined in 120, and the description of 120 previously provided is equally suited to describing 430. In particular, each key defines a unique piece of information associated with one or more vertices. As described before, the information related to consumer transactions may include credit card information, HTTP cookies associated with the computing resource used to complete the transaction from the buyer'"'"'s side, IP address of the computing resource, email address of the buyer, etc.
At 440, the ordered set of vertices is stored in storage. For instance, graph definer 320 stores the ordered set of vertices in storage 225. As such, as each set of information (e.g., information related to a single and new consumer transaction) is received, graph definer 320 parses out the information and stores that information in relation to a corresponding vertex in the ordered set of vertices. More particularly, the ordered vertices are stored in storage 225, such that information and their relationship to the keys are stored for ready access. As such, as the ordered set of vertices gets updated through the addition and deletion of vertices, a complete set of vertices is available for access in order to perform component enumeration of the representative graph.
For illustration purposes only,
For instance, column 510 provides the index number in the ordered set of vertices representing transaction attempts. These are ordered as transactions 1N, and can represent any ordering scheme, such as an ordering by time (e.g., time transaction received). The remaining columns providing information related to each of the vertices, in the form of keys. As explained above, the keys implicitly define edges, such that any two vertices that share a key also implicitly share an edge. For instance, as shown in table 500A, column 511 provides a credit card number, column 512 provides an email address, column 513 provides machine identifying information (e.g., unique ID identifying the computing resource used by the consumer to make the transaction), column 514 provides the IP address of the computing resource used by the consumer, etc.
It is important to note that table 500A is an illustration of the ordered set of vertices, and as such, the ordered set of vertices may be arranged in any number of other ways or configurations. As shown in table 500A, information common to consumer transactions are included in the columns, and are relevant when trying to group transaction together for purposes of further analysis. The information need not be presented by column, and can be presented in random fashion. For instance, a transaction could list relevant information in random order, as long as a reference to the corresponding key is made.
As shown in
Importantly, information maybe commonly shared between different transactions. For instance, as shown by curved line 521 the credit card number XX assigned to key1 is used in transactions 1 and N. As such, transactions 1 and N are related or linked by the credit card number XX. In addition, as shown by line 522, transactions 2 and N are linked by the common email address GG assigned to key6. Further, as shown by line 523, transactions 2 and N are linked by common machine ID HH assigned to key7. As such, transactions 1, 2 and N are related in that transaction 1 has information common with transaction N, which has information in common with transaction 2. This interrelationship or grouping may be important for purposes of performing further analysis.
Returning to
For illustration purposes only,
As shown in
Each list of vertices (e.g., column) includes vertices that are associated with the corresponding key. That is, those vertices include information that is associated with the corresponding key. For instance, in column 531 associated with key1, all the vertices (e.g., transactions 1, 5, 15, N, etc.) represent transactions that have used credit card number XX. Similarly, column 532 is associated with key22 and all the vertices (e.g., 33, 77, and 95) represent transactions that have used credit card number 22233344.
Turning back to
At 470, all associated vertices are pointed to a root vertex associated with the least valued index. At 480, the most current root vertex is stored in association with all the associated vertices. As described previously, the pointing operation links the associated vertices to other vertices in the graph. In one case, the pointing operation is performed internally on vertices of a particular list of vertices. That is, at least preliminarily, each vertex in the first list of vertices is pointed to a root vertex associated with the least valued index. For instance, in column 531, all the vertices associated with key1 point to vertex 1. That is, transactions 5, 15, and N each point to vertex 1. Similarly, for column 532, all the associated vertices (e.g., 33, 77, 95, and 100) point to vertex 33. Also, at least preliminarily, for column 533, all associated vertices point to vertex 15, but will eventually point to vertex 1, as will be described below. Further, at least preliminarily, for column 534, all associated vertices point to vertex 5, but will eventually point to vertex 1, as will be described below. And, at least preliminarily, for column 535, all associated vertices point to vertex 7, but will eventually point to vertex 1, as will be described below.
The operations in 460, 470, and 480 are repeated for each of the list of vertices. As the process in flow diagram 400 is performed on each of the plurality of lists of vertices, each vertex in the graph will point downhill to another vertex, the most current root vertex, within the context of the ordered set of vertices. Root vertices do not point downhill, but form the endpoint of link between vertices. Eventually, each vertex will point downhill to a root vertex in a corresponding component of the graph. The least valued index is associated with a vertex in the graph.
As shown in
Even further, when considering other processed lists of vertices, as shown in
As shown in
In particular,
Each row in
At 610, a key associated with a kvalue, Key(k), is accessed from storage. At 620, vertices associated with Key(k) are listed, such that vertices that are listed include information associated with Key(k). At 630, each vertex in the list is crossreferenced to determine if it points to a downhill vertex. For instance, Table 500C is accessed to determine if the corresponding vertex is pointing to a most current root vertex. At 640, a least valued index is determined from the group of associated vertices that is based on and includes the vertices in the list, as well as any vertices to which they point.
The operations at 650 and 655 determine the appropriate least valued index. That is, operations 650 and 655 loop until all associated vertices are considered. Put another way, all chained vertices are considered to determine the least valued index. In particular, decision step 650 considers whether the vertex associated with the least valued index points to another vertex. If so, the least valued index is reset to the index associated with the pointed to vertex. The process returns to 650 and loops until it is determined that the vertex associated with the least valued index does not point to another vertex, and continues to 670.
At 670, all associated vertices are pointed to the root vertex associated with the least valued index. That is, appropriate fields are populated or repopulated in storage (e.g., in the ordered set of vertices, or in Tables 500C and 500D). Associated vertices were previously considered when determining the least valued index. As such, all associated vertices are related in some manner through one or more keys.
At decision step 680, it is determined if there is another key to evaluate. If there is another key, the process sets the kvalue to the next available key. Thereafter, the process loops back to 610.
On the other hand, if all keys have been processed, then the method of flow diagram 600 ends. At this point, each vertex in the graph points to a corresponding root vertex. Also, each root vertex also defines a corresponding component of the graph. As such, vertices that point to a common root vertex belong to the same component.
Looking now at the quality of relationships between vertices, within each component of a graph the certainty of each key may diminish over time. As such, for a key that loses its relational certainty, different vertices sharing that key would no longer be related. For example, an IP address is assigned to a particular computing resource accessing the Internet for an indeterminate amount of time. The assignment of an IP address (associated with keyKK) can be transitory, lasting as long as a single Internet session used by a consumer to effect a transaction. When that session ends, that IP address (keyKK) may be assigned to another computing resource of another user. As such, two transactions with the same IP address (keyKK) may not be related even if the transactions are only 100 minutes apart. On the other hand, two transactions using the same card number (associated with keyII) may be related even if they are 100 days apart, since that credit card is associated with the same user.
To compensate for this uncertainty, keys are allowed to expire, in accordance with one embodiment of the present invention. More specifically, a key is allowed to expire after a condition is satisfied, in one embodiment. For instance, a key expires after a predetermined period of time according to a set schedule based upon its corresponding data type. Using the previous example, a key related to an IP address may expire after 30 minutes. Upon expiration, the list of vertices associated with the key is also deleted. However, the effect of deleting the key on the structure of a corresponding component is minimized, since other keys related to that component may provide the necessary relationship between vertices of the component. As such, only vertices and keys related to that component need be reset (repointing vertices and deleting information related to the expired key), instead of resetting all the components and vertices of the entire graph.
In practice a graph may contain millions of vertices, and undergo rapid addition and deletion of (implicit) edges. Historically, repeating such operations while maintaining an orderly, compact data structure has proved difficult, since it requires reanalyzing all the edges of the graph. However, embodiments of the present invention are able to efficiently handle such operations while performing component enumeration. As a result, embodiments of the present invention are able to dramatically improve performance over traditional methods of component enumeration.
A process for component enumeration can be expressed using the following exemplary pseudo code, in accordance with one embodiment of the present invention. In particular, let “V” be the set of vertices (“v”) containing information, such as consumer transaction information. Let “G” be the graph that results upon placing an edge between each pair of vertices with a shared key. The present embodiment is able to partition “V” into subsets that correspond to the connected components of “G,” without explicitly constructing “G.” This is accomplished by implicitly adding edges to “V” using the following pseudo code:
The resulting structure is a forest “F,” in which each tree has directed edges pointing toward the root. Each root in a tree of “F” is the vertex of lowest index in a component of “O.” Moreover, for each vertex “v” in any connected component of graph “O,” the corresponding tree in forest “F” contains a short path from the vertex to the root. By associating each tree with the index of its root vertex, all the connected components of “O” have successfully been labeled.
While the methods of embodiments illustrated in flow diagrams of
A method and system for fast enumeration of components of a graph is thus described. While the invention has been illustrated and described by means of specific embodiments, it is to be understood that numerous changes and modifications may be made therein without departing from the spirit and scope of the invention as defined in the appended claims and equivalents thereof. Furthermore, while the present invention has been described in particular embodiments, it should be appreciated that the present invention should not be construed as limited by such embodiments, but rather construed according to the below claims.