×

Searching databases that identifying group documents forming high-dimensional torus geometric k-means clustering, ranking, summarizing based on vector triplets

  • US 6,862,586 B1
  • Filed: 02/11/2000
  • Issued: 03/01/2005
  • Est. Priority Date: 02/11/2000
  • Status: Expired due to Fees
First Claim
Patent Images

1. A method of perforating a database search comprising:

  • searching a database using a query, said searching identifying a group of hyperlinked documents;

    forming a high-dimensional torus geometric representation of said hyperlinked documents, wherein each hyperlinked document is represented by a vector triplet comprising a normalized word frequency, a normalized out-link frequency and a normalized in-link frequency;

    clustering said result items into clusters based on said high-dimensional torus geometric representation;

    ranking items within each cluster of said clusters based on said high-dimensional torus geometric representation;

    summarizing contents of said clusters based on said high-dimensional torus geometric representation, wherein said clustering of the said vector triplets on said high-dimensional torus geometric representation is performed using a toric k-means clustering process that uses a cosine-type similarity measure between document vector triplets, thereby producing clusters of vector triplets and producing a concept triplet for each of the clusters; and

    summarizing said clusters of vector triplets based on nuggets of information including;

    identifying a closeness of said vector triplets in a cluster to said concept triplet for said cluster on said high-dimensional torus geometric representation;

    iidentifying said words with a highest normalized word frequency in said concept triplet for said cluster as the most frequent key-words for each of said clusters;

    identifying said out-links with a highest normalized out-link frequency in the concept triplet for the cluster as most frequent key out-links for each of said clusters;

    identifying said in-links with a highest normalized in-link frequency in the concept triplet for the cluster as most frequent important in-links for each cluster;

    identifying hypertext items relevant to the user'"'"'s query by using a weighting of terms used in said query;

    identifying documents closest to said concept triplet as most typical documents in a cluster, using a cosine-type textual content similarity measure between document vector triplets; and

    identifying documents closest to said concept triplet as most typical documents in a cluster, using a cosine-type out-link similarity measure between document vector triplets; and

    identifying documents closest to said concept triplet as most typical documents in a cluster, using a cosine-type in-link similarity measure between document vector triplets.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×