System and method for automatic clustering, sub-clustering and cluster hierarchization of search results in cross-referenced databases using articulation nodes
First Claim
1. A method for clustering and sub-clustering documents and/or other types of objects listed as entries in a cross-referenced database or plurality of databases, along with a hierarchization of the resultant clusters and sub-clusters, the method comprising the steps of:
- a) entering one or more first entries in the database, said first entries referred to as an original base set;
b) determining in the database second entries which reference to each of said first entries;
c) calculating a link number defined as the number of second entries referencing each of said first entries;
d) utilizing a connectivity index produced by a cross-referenced database for each of said first entries to create an augmented base set of said first entries;
e) expanding said augmented base set by adding to it all entries which reference and/or are referenced by each and every entry in said original base set;
f) iteratively repeating step e), in either a forward direction or a backward direction;
g) defining clusters and sub-clusters of the expanded set of entries;
h) creating a hierarchy of the said clusters and said sub-clusters;
i) presenting users, in a visual manner, the defined clusters and sub-cluster hierarchy; and
j) enabling users to store, in a persistent manner in a computer memory, any of the said clusters and/or said sub-clusters, and the visualization of their interconnections.
0 Assignments
0 Petitions
Accused Products
Abstract
Within the context of a cross-referenced data-base, an initial “base-set” of results to a query is generated using any conventional search engine tool. The base-set is then expanded by adding to it entries referencing entries in the original set or referenced by those entries, in a possibly iterative manner. The resulting collection of entries and references is represented as a mathematical graph or network, amendable to graph theoretic analysis. Connected components within the graph form top-level clusters, and articulation nodes within these clusters are calculated. These articulation nodes serve as both navigational “gateways” and anchors for sub-clusters. Sub-clusters, consisting of the transitive descendants of the articulation nodes, are associated with each articulation node. The articulation nodes themselves then form a graph, which is analyzed further for prominence, and a hierarchy of articulation nodes is calculated. The resulting hierarchy consisting of the top-level clusters and the sub-clusters associated with the articulation nodes is then presented visually to users in a manner enabling them to easily navigate through the space of expanded search results.
120 Citations
16 Claims
-
1. A method for clustering and sub-clustering documents and/or other types of objects listed as entries in a cross-referenced database or plurality of databases, along with a hierarchization of the resultant clusters and sub-clusters, the method comprising the steps of:
-
a) entering one or more first entries in the database, said first entries referred to as an original base set;
b) determining in the database second entries which reference to each of said first entries;
c) calculating a link number defined as the number of second entries referencing each of said first entries;
d) utilizing a connectivity index produced by a cross-referenced database for each of said first entries to create an augmented base set of said first entries;
e) expanding said augmented base set by adding to it all entries which reference and/or are referenced by each and every entry in said original base set;
f) iteratively repeating step e), in either a forward direction or a backward direction;
g) defining clusters and sub-clusters of the expanded set of entries;
h) creating a hierarchy of the said clusters and said sub-clusters;
i) presenting users, in a visual manner, the defined clusters and sub-cluster hierarchy; and
j) enabling users to store, in a persistent manner in a computer memory, any of the said clusters and/or said sub-clusters, and the visualization of their interconnections. - View Dependent Claims (2, 3, 4, 5, 6, 9, 10, 11, 12)
-
-
7. The method in accordance with claim 7, wherein said augmented base set is a set of web pages.
-
13. A system for clustering and sub-clustering documents and/or other types of objects listed as entries in a cross-referenced database, comprising:
-
a device for entering search entries in a search engine processor;
a device for calculating links between said search entries;
a device for mathematically representing an expanding set of said entries as a non-directed graph;
a device for calculating connection compounds of said graph;
a device for calculating articulation nodes bridging each of said connected components;
a device for defining transitive descendants of said articulation nodes, defined as a basic sub-cluster;
a device for creating a reduced mathematical directed graph utilizing said non-directed graph and said articulation nodes;
a prominence calculator used to order each of said articulation nodes in decreasing size based upon said connected components; and
a display device of displaying the output of said search entries. - View Dependent Claims (14, 15, 16)
-
Specification