Method and apparatus for clustering data
First Claim
1. A method for analyzing signals containing a data set which is representative of a plurality of physical phenomena, to identify and distinguish among said physical phenomena by determining clusters of data points within said data set, said method comprising:
- (1) constructing a physical analog Potts-spin model of the data set by(a) associating a Potts-spin variable si =1, 2 . . . q to each data point vi,(b) identifying neighbors of each point vi according to a selected criterion,(c) determining the Hamiltonian '"'"'H and determining the interaction Jij between neighboring points vi and vj,(2) locating a super-paramagnetic phase of the data set using the Monte Carlo procedure to determine susceptibility χ
(T) by(a) determining the thermal average magnetization (m) for different temperatures,(b) identifying the presence of a super-paramagnetic phase using susceptibility χ
,(3) determining the spin--spin correlation Gif for all neighboring points vi and vj,(4) constructing data clusters using the spin--spin correlation Gij within the super-paramagnetic phase located in step (2) to partition the data set, and(5) identifying said physical phenomena based on said data clusters.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and apparatus for partitioning a data set for clustering, based on the physical properties of an inhomogeneous ferromagnet. No assumption is made regarding the underlying distribution of the data. A Potts spin is assigned to each data point and an interaction between neighboring points is introduced, whose strength is a decreasing function of the distance between the neighbors. This magnetic system exhibits three phases. At very low temperatures it is completely ordered; i.e. all spins are aligned. At very high temperatures the system does not exhibit any ordering and in an intermediate regime clusters of relatively strongly coupled spins become ordered, whereas different clusters remain uncorrelated. This intermediate phase is identified by a jump in the order parameters. The spin--spin correlation function is used to partition the spins and the corresponding data points into clusters.
-
Citations
16 Claims
-
1. A method for analyzing signals containing a data set which is representative of a plurality of physical phenomena, to identify and distinguish among said physical phenomena by determining clusters of data points within said data set, said method comprising:
-
(1) constructing a physical analog Potts-spin model of the data set by (a) associating a Potts-spin variable si =1, 2 . . . q to each data point vi, (b) identifying neighbors of each point vi according to a selected criterion, (c) determining the Hamiltonian '"'"'H and determining the interaction Jij between neighboring points vi and vj, (2) locating a super-paramagnetic phase of the data set using the Monte Carlo procedure to determine susceptibility χ
(T) by(a) determining the thermal average magnetization (m) for different temperatures, (b) identifying the presence of a super-paramagnetic phase using susceptibility χ
,(3) determining the spin--spin correlation Gif for all neighboring points vi and vj, (4) constructing data clusters using the spin--spin correlation Gij within the super-paramagnetic phase located in step (2) to partition the data set, and (5) identifying said physical phenomena based on said data clusters. - View Dependent Claims (2, 3, 4, 5, 7)
-
-
6. Apparatus for analyzing signals containing a data set which is representative of a plurality of physical phenomena, to identify and distinguish among said physical phenomena by determining clusters of data points within said data set, said apparatus comprising:
-
(1) means for constructing a physical analog Potts-spin model of the data set including (a) means for associating a Potts-spin variable si =1, 2 . . . q to each data point vi, (b) means for identifying neighbors of each point vi according to a selected criterion, (c) means for determining the Hamiltonian '"'"'H and determining the interaction Jij between neighboring points vi and vj, (2) means for locating a super-paramagnetic phase of the data set using the Monte Carlo procedure to determine susceptibility χ
(T) including;(a) means for determining the thermal average magnetization (m) for different temperatures, (b) means for identifying the presence of a super-paramagnetic phase using susceptibility, χ
,(3) means for determining the spin--spin correlation Gij for all neighboring points vi and vj, (4) means for constructing data using the spin--spin correlation Gij within the located super-paramagnetic phase to partition the data set, and (5) means for identifying said physical phenomena based on said data clusters. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A method for recognizing a plurality of physical objects or phenomena represented collectively by a data set which comprises data points defined by a plurality of parametric values, by sorting said data points into clusters of data points, comprising:
-
(1) constructing a physical analog Potts-spin model of the data set by (a) associating a Potts-spin variable si =1, 2 . . . q to each data point vi, (b) identifying neighbors of each point vi according to a selected criterion, (c) determining the Hamiltonian H and determining the interaction Jij between neighboring points vi and vj, (2) locating a super-paramagnetic phase of the data set using the Monte Carlo procedure to determine susceptibility χ
(T) by(a) determining the thermal average magnetization (m) for different temperatures, (b) identifying the presence of a super-paramagnetic phase using susceptibility χ
,(3) determining the spin--spin correlation Gij for all neighboring points vi and vj, (4) constructing data clusters using the spin--spin correction Gij within the super-paramagnetic phase located in step (2) to partition the data set; and (5) identifying said physical phenomena based on said data clusters. - View Dependent Claims (14)
-
-
15. An apparatus for recognizing a plurality of physical objects or phenomena represented collectively by a data set which comprises data points defined by a plurality of parametric values, by sorting said data points into clusters of data points, said apparatus comprising:
-
(1) means for constructing a physical analog Potts-spin model of the data set by (a) associating a Potts-spin variable si =1, 2 . . . q to each data point vi, (b) identifying neighbors of each point vi according to a selected criterion, (c) determining the Hamiltonian H and determining the interaction Jij, between neighboring points vi and vj, (2) means for locating a super-paramagnetic phase of the data set using the Monte Carlo procedure to determine susceptibility C(T) by (a) determining the thermal average magnetization (m) for different temperatures, (b) identifying the presence of a super-paramagnetic phase using susceptibility χ
,(3) means for determining the spin--spin correlation Gij for all neighboring points vi and vj, (4) means for constructing data clusters using the spin--spin correction Gij within the super-paramagnetic phase located in step (2) to partition the data set; and (5) means for identifying said physical phenomena based on said data clusters. - View Dependent Claims (16)
-
Specification