Clustering and classification of multimedia data
First Claim
Patent Images
1. A computerized method comprising:
- generating, with a cluster content computer, a hierarchy of clusters of category data, the hierarchy comprises a plurality of levels of different sets of clusters, wherein at least one higher set of clusters is derived from a lower set of clusters and the generating the hierarchy includes,calculating similarity values between clusters in the lower set of clusters, the similarity values are based on a probability distribution for each cluster and an entropic distance metric, the probability distribution for each cluster is a probability of an occurrence of an attribute in the category data occurring in that cluster, and the entropic distance metric is a instance metric of cluster pairs in the lower set of clusters, andidentifying a cluster pair in the lower set of clusters that minimizes the loss of information; and
representing records of multimedia content as the hierarchy of clusters of category data, wherein the category data is defined in a vector space comprising multiple attributes, and wherein the records comprise category data.
1 Assignment
0 Petitions
Accused Products
Abstract
Records including category data is clustered by representing the data as a plurality of clusters, and generating a hierarchy of clusters based on the clusters. Records including category data are classified into folders according to a predetermined entropic similarity condition.
111 Citations
44 Claims
-
1. A computerized method comprising:
-
generating, with a cluster content computer, a hierarchy of clusters of category data, the hierarchy comprises a plurality of levels of different sets of clusters, wherein at least one higher set of clusters is derived from a lower set of clusters and the generating the hierarchy includes, calculating similarity values between clusters in the lower set of clusters, the similarity values are based on a probability distribution for each cluster and an entropic distance metric, the probability distribution for each cluster is a probability of an occurrence of an attribute in the category data occurring in that cluster, and the entropic distance metric is a instance metric of cluster pairs in the lower set of clusters, and identifying a cluster pair in the lower set of clusters that minimizes the loss of information; and representing records of multimedia content as the hierarchy of clusters of category data, wherein the category data is defined in a vector space comprising multiple attributes, and wherein the records comprise category data. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A machine-readable storage medium having executable instructions to a cause a machine to perform a method comprising:
-
generating a hierarchy of clusters of category data, the hierarchy comprises a plurality of levels of different sets of clusters, wherein at least one higher set of clusters derived from a lower set of clusters and the generating the hierarchy includes, calculating similarity values between clusters in the lower set of clusters, the similarity values are based on a probability distribution for each cluster and an entropic distance metric, the probability distribution for each cluster is a probability of an occurrence of an attribute in the category data occurring in that cluster, and the entropic distance metric is a distance metric of cluster pairs in the lower set of clusters, and identifying a cluster pair in the lower set of clusters that minimizes the loss of information; and representing records of multimedia content as the hierarchy of clusters of category data, wherein the category data is defined in a vector space comprising multiple attributes, and wherein the records comprise category data. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A computerized system comprising:
-
a processor coupled to a memory through a bus; and a process executed from the memory by the processor to cause the processor to;
generate a hierarchy of clusters of category data, the hierarchy comprises a plurality of levels of different sets of clusters, wherein at least one higher set of clusters derived from a lower set of clusters and the generation of the hierarchy further causes the processor to calculate similarity values between clusters in the lower set of clusters, the similarity values are based on a probability distribution for each cluster and an entropic distance metric, the probability distribution for each cluster is a probability of an occurrence of an attribute in the category data occurring in that cluster, and the entropic distance metric is a distance metric of cluster pairs in the lower set of clusters, and to identify a cluster pair in the lower set of clusters that minimizes the loss of information, and represent records of multimedia content as the hierarchy of clusters of category data, wherein the category data is defined in a vector space comprising multiple attributes, and wherein the records comprise category data. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27)
-
-
28. A computerized method comprising:
-
creating, with a classifying content computer, an internal representation for each of a plurality of folders of records, wherein each folder internal representation is based on a first probability distribution of category data, the category data defined in a vector space comprising multiple attributes, and each of the first probability distributions corresponding to a folder includes a probability of occurrence that each of the multiple attributes occurs in that folder; creating an internal representation for each of a plurality of records, wherein each record internal representation is based on a second probability distribution of category data and each of the second probability distributions corresponding to a record includes a probability of occurrence that each of the multiple attributes occurs in that record; and classifying the plurality of records into the plurality of folders according to a predetermined entropic similarity condition using the plurality of first and second probability distributions. - View Dependent Claims (29, 30, 31, 32, 33, 34)
-
-
35. A machine-readable storage medium having executable instructions to cause a processor to perform a method, the method comprising:
-
creating an internal representation for each of a plurality of folders of records, wherein each folder internal representation is based on a first probability distribution of category data, the category data defined in a vector space comprising multiple attributes, and each of the first probability distributions corresponding to a folder includes a provability of occurrence that each of the multiple attributes occurs in that folder; creating an internal representation for each of a plurality of records, wherein each record internal representation is based on a second probability distribution of category data and each of the second probability distributions corresponding to a record includes a probability of occurrence that each of the multiple attributes occurs in that record; and classifying the plurality of records into the plurality of folders according to a predetermined entropic similarity condition using the plurality of the first and second probability distributions, and wherein the records comprise the category data. - View Dependent Claims (36, 37, 38, 39, 40, 41)
-
-
42. A computer system comprising:
-
a processor coupled to a memory through a bus; and a process executed from the memory by the processor to cause the processor to create an internal representation for each of a plurality of folders of records, wherein each folder internal representation is based on a probability distribution of category data, the category data defined in a vector space comprising multiple attributes, and each of the probability distributions corresponding to a folder include a probability of occurrence that each of the multiple attributes occurs in that folder, to create an internal representation for each of a plurality of records, wherein each record internal representation is based on a second probability distribution of category data and each of the second probability distributions corresponding to a record includes a probability of occurrence that each of the multiple attributes occurs in that record, and to classify the plurality of records into the plurality of folders according to a predetermined entropic similarity condition using the plurality of first and second probability distributions, wherein the records comprise category data. - View Dependent Claims (43, 44)
-
Specification