×

Clustering hypertext with applications to web searching

  • US 6,684,205 B1
  • Filed: 10/18/2000
  • Issued: 01/27/2004
  • Est. Priority Date: 10/18/2000
  • Status: Expired due to Term
First Claim
Patent Images

1. A method of searching a database containing hypertext documents, said method comprising:

  • searching said database using a query to produce a set of hypertext documents;

    geometrically clustering said set of hypertext documents into various clusters using a similarity measure such that documents within each cluster are similar to each other, wherein said clustering has a linear-time complexity in producing said set of hypertext documents, wherein said similarity measure comprises a weighted sum of maximized individual components of said set of hypertext documents, wherein said clustering is based upon words contained in each hypertext document, out-links from each hypertext document, and in-links to each hypertext document; and

    eliminating said hypertext documents if said words contained in said hypertext documents appear in fewer than two documents, said out-links contained in said hypertext documents are pointed to by fewer than two documents, and said in-links contained in said hypertext documents are pointed to by fewer than two documents.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×