Method of clustering multi-dimensional related data in a computer database by combining the two verticles of a graph connected by an edge having the highest score
First Claim
1. A method of clustering multi-dimensional related data in a computer database, said computer database including a set of data records, each data record storing information about an object of interest, comprising the steps of:
- a. establishing a vertex for every object of interest in said computer database so as to provide a graph comprising a plurality of vertices, each vertex representing a cluster;
b. connecting selected pairs of said plurality of vertices with edges;
c. calculating, on a computer, a score for the edges connecting the vertices, the score being a measure of how desirable it is to combine the vertices associated therewith;
d. selecting two vertices connected with an edge having the best score;
e. combining the two vertices by combining the clusters represented by the two vertices;
f. creating a new merged vertex and edges and calculating on a computer a new score for the new edges;
g. repeating steps (d) through (f) until a predetermined terminating condition is reached; and
h. storing remaining clusters in a computer readable medium.
2 Assignments
0 Petitions
Accused Products
Abstract
A method of clustering multi-dimensional related data performed by identifying features from a collection of data, each of said features being represented by a vertex, selecting pairs of features that it is desirable to cluster together, connecting the pair of vertices representing every selected pair of features by an edge, assigning a score to every edge according to a predetermined formula, selecting an edge having the highest score, creating a new vertex by merging the vertices connected by the selected edge, creating new edges between the new vertex and the vertices previously connected to the merged vertices, and repeating this procedure until every edge has a predetermined score.
105 Citations
22 Claims
-
1. A method of clustering multi-dimensional related data in a computer database, said computer database including a set of data records, each data record storing information about an object of interest, comprising the steps of:
-
a. establishing a vertex for every object of interest in said computer database so as to provide a graph comprising a plurality of vertices, each vertex representing a cluster; b. connecting selected pairs of said plurality of vertices with edges; c. calculating, on a computer, a score for the edges connecting the vertices, the score being a measure of how desirable it is to combine the vertices associated therewith; d. selecting two vertices connected with an edge having the best score; e. combining the two vertices by combining the clusters represented by the two vertices; f. creating a new merged vertex and edges and calculating on a computer a new score for the new edges; g. repeating steps (d) through (f) until a predetermined terminating condition is reached; and h. storing remaining clusters in a computer readable medium. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A method of clustering multi-dimensional related data in a computer data base, said computer database including a set of data records, each data record storing information about an object of interest, comprising the steps of:
-
creating a vertex for at least a subset of objects of interest in said computer database, each of said vertices representing a cluster; connecting pairs of vertices with edges, each edge connecting two vertices; calculating a score for the edges connecting the vertices using a computer, the score being a measure of how desirable it is to combine the vertices associated therewith; selecting one edge having a score indicating a high desirability to combine said two vertices connected by said one edge; combining the clusters represented by the two vertices connected by said one edge, the combination resulting in a new cluster; creating a new merged vertex representing the new cluster; connecting the new merged vertex and at least one preexisting vertex with one or more new edges; calculating a new score for the one or more new edges; and storing remaining clusters in a computer readable medium. - View Dependent Claims (15, 16, 17)
-
-
18. A computer readable storage medium having computer readable code embodied on said computer readable storage medium, said computer readable code for clustering multi-dimensional related data in a computer database, said computer database including a set of data records, each data record storing information about an object of interest, said computer readable code comprising:
-
first program code, said first program code establishes a vertex for at least a subset of objects of interest in said computer database, each of said vertices representing a cluster; second program code, said second program code connects selected pairs of said vertices with edges, each edge connecting two vertices; third program code, said third program code calculates a score for the edges connecting the vertices, the score being a measure of how desirable it is to combine the vertices associated therewith; fourth program code, said fourth program code selects two vertices connected with one edge having a score indicating a high desirability to combine the two vertices; fifth program code, said fifth program code combines the selected two vertices including combining the clusters represented by the selected two vertices; sixth program code, said sixth program code creates a new merged vertex and edges and calculates a new score for the new edges; and seventh program code, said seventh program code stores remaining clusters in a computer readable medium. - View Dependent Claims (19, 20, 21, 22)
-
Specification