System and method for signature-based unsupervised clustering of data elements
First Claim
1. A computerized method for signature-based unsupervised clustering of data elements, comprising:
- receiving a plurality of clusters, each cluster comprising at least a data element;
generating an upper triangular matrix respective of the clusters;
generating a signature for each of the clusters, wherein a signature is generated from multiple patches of a multimedia data element, wherein multiple patches are of random length and random position within the multimedia data element;
generating a match score between each of two different clusters;
storing the match score in a cell of the upper triangular matrix corresponding to the two clusters;
determining whether any of the match scores is above a predefined threshold value;
clustering every two clusters that are determined to have a score above a predetermined threshold; and
repeating the generation of an upper triangular matrix respective of the clusters until a single cluster is reached, wherein the clustering on the respective generated signatures creates clusters that include a collection of signatures respective of the multimedia data elements.
3 Assignments
0 Petitions
Accused Products
Abstract
A method and system for signature-based unsupervised clustering of data elements. The method comprises receiving a plurality of clusters; generating a triangular matrix respective of the clusters; generating a signature for each of the clusters; generating a match score between each of two different clusters; storing the match score in a cell of the triangular matrix corresponding to the two clusters; determining whether any of the match scores is above a predefined threshold value; clustering every two clusters that are determined to have a score above a predetermined threshold; and repeating the generation of a triangular matrix respective of the clusters until a single cluster is reached. The system comprises an interface; a processor; a memory for storing at least one cluster; and a memory coupled to the processor, the memory containing instructions that, when executed by the processor, configure the system to perform the steps of the method.
189 Citations
22 Claims
-
1. A computerized method for signature-based unsupervised clustering of data elements, comprising:
-
receiving a plurality of clusters, each cluster comprising at least a data element; generating an upper triangular matrix respective of the clusters; generating a signature for each of the clusters, wherein a signature is generated from multiple patches of a multimedia data element, wherein multiple patches are of random length and random position within the multimedia data element; generating a match score between each of two different clusters; storing the match score in a cell of the upper triangular matrix corresponding to the two clusters; determining whether any of the match scores is above a predefined threshold value; clustering every two clusters that are determined to have a score above a predetermined threshold; and repeating the generation of an upper triangular matrix respective of the clusters until a single cluster is reached, wherein the clustering on the respective generated signatures creates clusters that include a collection of signatures respective of the multimedia data elements. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
12. An apparatus for performing unsupervised clustering of data elements, comprising:
-
an interface for allowing access to a plurality of data elements; at least one processing unit; a storage unit for storing at least one cluster of data elements; and a memory coupled to the at least one processing unit, the memory containing instructions that, when executed by the at least one processing unit, configure the apparatus to; receive a plurality of clusters, each cluster comprising a data element; generate an upper triangular matrix respective of the clusters, the upper triangular matrix stored in the memory; generate a signature for each of the clusters, wherein a signature is generated from multiple patches of a multimedia data element, wherein multiple patches are of random length and random position within the multimedia data element; generate a match score between each of two different clusters; store the match score in a cell of the upper triangular matrix corresponding to the two clusters of the match score; determine whether any of the match scores is above a predefined threshold value; cluster every two clusters that are determined to have a score above a predetermined threshold; and repeat the generation of the upper triangular matrix respective of the clusters until a single cluster is reached, wherein the clustering on the respective generated signatures creates clusters that include a collection of signatures respective of the multimedia data elements.
-
Specification