Methods and systems for constructing a taxonomy based on hierarchical clustering
First Claim
1. A computer system including instructions stored on one or more computer-readable medium and executable by one or more processors, the computer system comprising:
- a statistic calculator configured to receive a plurality of content files and to determine at least one statistical measure of the content of the plurality of content files, said statistic calculator configured to store the at least one statistical measure within a statistics repository;
a cluster controller configured to cause the one or more processors to generate a hierarchy of clusters based on the at least one statistical measure stored in the statistics repository, wherein the hierarchy of clusters comprises at least two levels, each cluster within a first level of the hierarchy of clusters comprising at least one content file, each cluster within a second level of the hierarchy of clusters comprising at least one cluster of the first level of the hierarchy of clusters;
an aggregator configured to aggregate the content files of each cluster, and transmit the aggregated content file to said cluster controller to form a third level of the hierarchy of clusters;
a label manager configured to cause the one or more processors to determine a label for each cluster within the hierarchy of clusters based on the at least one statistical measure, the label identifying a topic of information contained within each cluster, the topic related to at least one of a problem experienced by a user and a request for assistance in solving the problem; and
a taxonomy manager configured to cause the one or more processors to output a taxonomy based on the hierarchy of clusters and the determined labels.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods and systems for constructing a taxonomy based on hierarchical clustering are provided. The taxonomy is generated by first constructing a hierarchy of clusters using a clustering algorithm. A first level of the hierarchy of clusters is generated by providing a plurality of content files to a clustering algorithm. Subsequent levels of the hierarchy are generated by providing the clusters of the preceding levels to the clustering algorithm. Labels that characterize each cluster within the hierarchy are assigned to corresponding clusters. Labels and clusters are combined to form the taxonomy.
-
Citations
20 Claims
-
1. A computer system including instructions stored on one or more computer-readable medium and executable by one or more processors, the computer system comprising:
-
a statistic calculator configured to receive a plurality of content files and to determine at least one statistical measure of the content of the plurality of content files, said statistic calculator configured to store the at least one statistical measure within a statistics repository; a cluster controller configured to cause the one or more processors to generate a hierarchy of clusters based on the at least one statistical measure stored in the statistics repository, wherein the hierarchy of clusters comprises at least two levels, each cluster within a first level of the hierarchy of clusters comprising at least one content file, each cluster within a second level of the hierarchy of clusters comprising at least one cluster of the first level of the hierarchy of clusters; an aggregator configured to aggregate the content files of each cluster, and transmit the aggregated content file to said cluster controller to form a third level of the hierarchy of clusters; a label manager configured to cause the one or more processors to determine a label for each cluster within the hierarchy of clusters based on the at least one statistical measure, the label identifying a topic of information contained within each cluster, the topic related to at least one of a problem experienced by a user and a request for assistance in solving the problem; and a taxonomy manager configured to cause the one or more processors to output a taxonomy based on the hierarchy of clusters and the determined labels. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. One or more non-transitory computer-readable media having computer-executable instructions embodied thereon, wherein when executed by one or more processors, the computer-executable instructions cause the one or more processors to:
-
receive a plurality of content files; determine at least one statistical measure of the content of the plurality of content files; store the at least one statistical measure within a statistics repository for a content of the plurality of content files; determine a number of clusters to be included in a first level of a hierarchy of clusters; assign a plurality of content files to the first level of the hierarchy of clusters based on the at least one statistical measure stored in the statistics repository; generate a first aggregate data file of the content files for each cluster in the first level of the hierarchy of clusters; cluster the aggregate data file to generate a second level of the hierarchy of clusters; aggregate the content files of each cluster; form a third level of the hierarchy of clusters using the aggregated content file; determine a label for each cluster in the hierarchy of clusters based on the at least one statistical measure, which identifies a topic of the content files within the cluster, the topic related to at least one of a problem experienced by a user and a request for assistance in solving the problem; and output a taxonomy based on the determined labels. - View Dependent Claims (9, 10, 11, 12, 13, 14, 15)
-
-
16. A computer-implemented method including executing instructions stored on a computer-readable medium, the method comprising:
-
receiving a plurality of content files; determining at least one statistical measure of the content of the plurality of content files; storing the at least one statistical measure within a statistics repository of a content of a plurality of content files; determining a number of clusters to be included in a first level of a hierarchy of clusters; assigning a plurality of content files to the first level of the hierarchy of clusters by applying at least one clustering algorithm to the plurality of content files, wherein each cluster includes at least one content file; generating a first aggregate data file for each cluster in the first level of the hierarchy of clusters by aggregating content data of each content file included within each of the clusters included in the first level of the hierarchy of clusters; generating a second level of the hierarchy of clusters by providing each of the first aggregate data files to the at least one clustering algorithm; aggregating the content files of each cluster; forming a third level of the hierarchy of clusters using the aggregated content file; determining a label for each cluster included in the hierarchy of clusters based on the at least one statistical measure, which identifies a topic of the content files within the cluster, the topic related to at least one of a problem experienced by a user and a request for assistance in solving the problem; and outputting a taxonomy based on the determined labels. - View Dependent Claims (17, 18, 19, 20)
-
Specification