Information resource taxonomy
First Claim
Patent Images
1. A process for generating a taxonomy for a plurality of information resources in a communications network, including:
- (i) collecting said plurality of information resources from said communications network;
(ii) generating clusters of said plurality of collected information resources on the basis of a similarity threshold value for clustering and similarity values for said plurality of collected information resources;
(iii) iteratively generating sub-clusters of said generated clusters based on the similarity threshold value for clustering and similarity values for information resources within each of said generated clusters and within each of said generated sub-clusters, wherein the generated clusters and sub-clusters provide a hierarchy of resource clusters, wherein the number of resource clusters at each level of said hierarchy is determined by content of said plurality of collected information resources;
(iv) collecting further information resources from said communications network;
(v) assigning the further collected information resources to a plurality of the resource clusters;
(vi) maintaining the coherence of the plurality of resource clusters as further collected information resources are assigned by at least one of;
(a) reducing the similarity threshold value for clustering with an increasing number of the further collected information resources; and
(b) selecting a random subset of resources from the collected information resources;
generating a new similarity threshold value for clustering based on the selected random subset of resources; and
re-clustering the collected information resources using the generated new similarity threshold value for clustering; and
(vii) repeating the steps of collecting further information resources, reducing similarity and maintaining the coherence.
2 Assignments
0 Petitions
Accused Products
Abstract
An information resource taxonomy system, including a data collector for collecting information resources from a communications network; and a taxonomy generator for generating a taxonomy represented by a hierarchy of resource clusters, using cluster criteria generated from the collected resources. The system includes an editor for editing the criteria, and a renderer for generating linked document data for displaying the hierarchy. A parallel cluster search system is used to evaluate clusters in parallel. The system also includes a parallel classifier for classifying further collected resources.
-
Citations
40 Claims
-
1. A process for generating a taxonomy for a plurality of information resources in a communications network, including:
-
(i) collecting said plurality of information resources from said communications network; (ii) generating clusters of said plurality of collected information resources on the basis of a similarity threshold value for clustering and similarity values for said plurality of collected information resources; (iii) iteratively generating sub-clusters of said generated clusters based on the similarity threshold value for clustering and similarity values for information resources within each of said generated clusters and within each of said generated sub-clusters, wherein the generated clusters and sub-clusters provide a hierarchy of resource clusters, wherein the number of resource clusters at each level of said hierarchy is determined by content of said plurality of collected information resources; (iv) collecting further information resources from said communications network; (v) assigning the further collected information resources to a plurality of the resource clusters; (vi) maintaining the coherence of the plurality of resource clusters as further collected information resources are assigned by at least one of; (a) reducing the similarity threshold value for clustering with an increasing number of the further collected information resources; and (b) selecting a random subset of resources from the collected information resources;
generating a new similarity threshold value for clustering based on the selected random subset of resources; and
re-clustering the collected information resources using the generated new similarity threshold value for clustering; and(vii) repeating the steps of collecting further information resources, reducing similarity and maintaining the coherence. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32)
-
-
33. An information resource taxonomy system, including:
-
a data collector having computer system hardware components including at least one processor operating according to one or more software modules, the at least one processor and one or more software modules configured for collecting information resources from a communications network; a taxonomy generator for generating clusters of said collected information resources based on a similarity threshold value for clustering and similarity values for said collected information resources and for iteratively generating sub-clusters of said generated clusters based on the similarity threshold value for clustering and similarity values for information resources within each of said generated clusters and within each of said generated sub-clusters, wherein the generated clusters and sub-clusters provide a hierarchy of resource clusters, wherein the number of resource clusters in each level of said hierarchy is determined by content of said collected information resources; a classifier configured to classify further information resources collected from the communication network to a plurality of the resource clusters; and a component configured to maintain the coherence of the plurality of resource clusters as further information resources are classified by at least one of; (a) reducing the similarity threshold value for clustering with increasing numbers of the further collected information resources; and (b) selecting a random subset of information resources from the collected information resources; generating a new similarity threshold value for the selected random subset of information resource; and re-clustering the collected information resources using the new similarity threshold value for clustering. - View Dependent Claims (34, 35, 36, 37, 38, 39, 40)
-
Specification