Clustering hypertext with applications to web searching
First Claim
1. A method of searching a database containing hypertext documents, said method comprising:
- searching said database using a query to produce a set of hypertext documents;
geometrically clustering said set of hypertext documents into various clusters using a similarity measure such that documents within each cluster are similar to each other, wherein said clustering has a linear-time complexity in producing said set of hypertext documents, wherein said similarity measure comprises a weighted sum of maximized individual components of said set of hypertext documents, wherein said clustering is based upon words contained in each hypertext document, out-links from each hypertext document, and in-links to each hypertext document; and
eliminating said hypertext documents if said words contained in said hypertext documents appear in fewer than two documents, said out-links contained in said hypertext documents are pointed to by fewer than two documents, and said in-links contained in said hypertext documents are pointed to by fewer than two documents.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and structure of searching a database containing hypertext documents comprising searching the database using a query to produce a set of hypertext documents; and geometrically clustering the set of hypertext documents into various clusters using a toric k-means similarity measure such that documents within each cluster are similar to each other, wherein the clustering has a linear-time complexity in producing the set of hypertext documents, wherein the similarity measure comprises a weighted sum of maximized individual components of the set of hypertext documents, and wherein the clustering is based upon words contained in each hypertext document, out-links from each hypertext document, and in-links to each hypertext document.
180 Citations
24 Claims
-
1. A method of searching a database containing hypertext documents, said method comprising:
-
searching said database using a query to produce a set of hypertext documents;
geometrically clustering said set of hypertext documents into various clusters using a similarity measure such that documents within each cluster are similar to each other, wherein said clustering has a linear-time complexity in producing said set of hypertext documents, wherein said similarity measure comprises a weighted sum of maximized individual components of said set of hypertext documents, wherein said clustering is based upon words contained in each hypertext document, out-links from each hypertext document, and in-links to each hypertext document; and
eliminating said hypertext documents if said words contained in said hypertext documents appear in fewer than two documents, said out-links contained in said hypertext documents are pointed to by fewer than two documents, and said in-links contained in said hypertext documents are pointed to by fewer than two documents. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A program storage device readable by a computer, tangibly embodying a program of instructions executable by the computer to perform a method of searching a database containing hypertext documents, said method comprising:
-
searching said database using a query to produce a set of hypertext documents;
geometrically clustering said set of hypertext documents into various clusters using a similarity measure such that documents within each cluster are similar to each other, wherein said clustering has a linear-time complexity in producing said set of hypertext documents, wherein said similarity measure comprises a weighted sum of maximized individual components of said set of hypertext documents, wherein said clustering is based upon words contained in each hypertext document, out-links from each hypertext document, and in-links to each hypertext document; and
eliminating said hypertext documents if said words contained in said hypertext documents appear in fewer than two documents, said out-links contained in said hypertext documents are pointed to by fewer than two documents, and said in-links contained in said hypertext documents are pointed to by fewer than two documents. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
-
Specification