Searching databases that identifying group documents forming high-dimensional torus geometric k-means clustering, ranking, summarizing based on vector triplets
First Claim
Patent Images
1. A method of perforating a database search comprising:
- searching a database using a query, said searching identifying a group of hyperlinked documents;
forming a high-dimensional torus geometric representation of said hyperlinked documents, wherein each hyperlinked document is represented by a vector triplet comprising a normalized word frequency, a normalized out-link frequency and a normalized in-link frequency;
clustering said result items into clusters based on said high-dimensional torus geometric representation;
ranking items within each cluster of said clusters based on said high-dimensional torus geometric representation;
summarizing contents of said clusters based on said high-dimensional torus geometric representation, wherein said clustering of the said vector triplets on said high-dimensional torus geometric representation is performed using a toric k-means clustering process that uses a cosine-type similarity measure between document vector triplets, thereby producing clusters of vector triplets and producing a concept triplet for each of the clusters; and
summarizing said clusters of vector triplets based on nuggets of information including;
identifying a closeness of said vector triplets in a cluster to said concept triplet for said cluster on said high-dimensional torus geometric representation;
iidentifying said words with a highest normalized word frequency in said concept triplet for said cluster as the most frequent key-words for each of said clusters;
identifying said out-links with a highest normalized out-link frequency in the concept triplet for the cluster as most frequent key out-links for each of said clusters;
identifying said in-links with a highest normalized in-link frequency in the concept triplet for the cluster as most frequent important in-links for each cluster;
identifying hypertext items relevant to the user'"'"'s query by using a weighting of terms used in said query;
identifying documents closest to said concept triplet as most typical documents in a cluster, using a cosine-type textual content similarity measure between document vector triplets; and
identifying documents closest to said concept triplet as most typical documents in a cluster, using a cosine-type out-link similarity measure between document vector triplets; and
identifying documents closest to said concept triplet as most typical documents in a cluster, using a cosine-type in-link similarity measure between document vector triplets.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and structure for performing a database search includes searching a database using a query (searching producing result items), and ranking the result items based on one or more of a frequency of an occurrence of in-links and out-links in each of the result items.
-
Citations
8 Claims
-
1. A method of perforating a database search comprising:
-
searching a database using a query, said searching identifying a group of hyperlinked documents;
forming a high-dimensional torus geometric representation of said hyperlinked documents, wherein each hyperlinked document is represented by a vector triplet comprising a normalized word frequency, a normalized out-link frequency and a normalized in-link frequency;
clustering said result items into clusters based on said high-dimensional torus geometric representation;
ranking items within each cluster of said clusters based on said high-dimensional torus geometric representation;
summarizing contents of said clusters based on said high-dimensional torus geometric representation, wherein said clustering of the said vector triplets on said high-dimensional torus geometric representation is performed using a toric k-means clustering process that uses a cosine-type similarity measure between document vector triplets, thereby producing clusters of vector triplets and producing a concept triplet for each of the clusters; and
summarizing said clusters of vector triplets based on nuggets of information including;
identifying a closeness of said vector triplets in a cluster to said concept triplet for said cluster on said high-dimensional torus geometric representation;
iidentifying said words with a highest normalized word frequency in said concept triplet for said cluster as the most frequent key-words for each of said clusters;
identifying said out-links with a highest normalized out-link frequency in the concept triplet for the cluster as most frequent key out-links for each of said clusters;
identifying said in-links with a highest normalized in-link frequency in the concept triplet for the cluster as most frequent important in-links for each cluster;
identifying hypertext items relevant to the user'"'"'s query by using a weighting of terms used in said query;
identifying documents closest to said concept triplet as most typical documents in a cluster, using a cosine-type textual content similarity measure between document vector triplets; and
identifying documents closest to said concept triplet as most typical documents in a cluster, using a cosine-type out-link similarity measure between document vector triplets; and
identifying documents closest to said concept triplet as most typical documents in a cluster, using a cosine-type in-link similarity measure between document vector triplets.
-
-
2. A method of performing a database search comprising:
-
searching a database using a query, said searching identifying a group of documents;
forming a high-dimensional torus geometric representation of said documents, wherein each document is represented by a vector triplet comprising a normalized word frequency, a normalized out-link frequency and a normalized in-link frequency;
identifying documents closest to a concept triplet as most typical documents in a cluster, using a cosine-type out-link similarity measure between document vector triplets; and
identifying documents closest to said concept triplet as most typical documents in a cluster, using a cosine-type in-link similarity measure between document vector triplets. - View Dependent Claims (3, 4, 5, 6, 7, 8)
-
Specification