Three-dimensional display of document set
First Claim
1. A method for presenting information by relative relationships of content and context of a plurality of documents, wherein the relative relationships are presented in a three-dimensional landscape with the relative size and height of a peak in the landscape representing the relative significance of a relationship of a topic attribute and each one of the documents, comprising the steps of:
- (a) representing each document as a high dimensional vector;
(b) producing a partition set on the plurality of documents, said partition set resulting in a cluster centroid for each of the documents; and
(c) projecting each said high dimensional vector and at least one said cluster centroid into a 2-dimensional representation.
0 Assignments
0 Petitions
Accused Products
Abstract
A method for spatializing text content for enhanced visual browsing and analysis. The invention is applied to large text document corpora such as digital libraries, regulations and procedures, archived reports, and the like. The text content from these sources may be transformed to a spatial representation that preserves informational characteristics from the documents. The three-dimensional representation may then be visually browsed and analyzed in ways that avoid language processing and that reduce the analysts'"'"' effort.
61 Citations
23 Claims
-
1. A method for presenting information by relative relationships of content and context of a plurality of documents, wherein the relative relationships are presented in a three-dimensional landscape with the relative size and height of a peak in the landscape representing the relative significance of a relationship of a topic attribute and each one of the documents, comprising the steps of:
-
(a) representing each document as a high dimensional vector;
(b) producing a partition set on the plurality of documents, said partition set resulting in a cluster centroid for each of the documents; and
(c) projecting each said high dimensional vector and at least one said cluster centroid into a 2-dimensional representation. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
(d) producing a coordinate pair for each document; and
(e) displaying coordinate pairs for each document in a scatter plot yielding a Galaxies two-dimensional visualization.
-
-
3. The method of claim 2, further comprising the step of:
(f) producing a three-dimensional representation of said coordinate pairs, said three-dimensional representation resulting in a thematic landscape.
-
4. The method of claim 3, wherein step (f) comprises the steps of:
-
(1) receiving an n-dimensional context vector for each document from a text engine;
(2) clustering each document in n-dimensional space, thereby producing a cluster for each document; and
(3) receiving from a text engine, for said cluster, associated gisting terms or topics.
-
-
5. The method of claim 1, further comprising the step of:
(d) initially inputting each of the documents into a text engine.
-
6. The method of claim 1, wherein stop (b) comprises the step of creating a cluster centroid by grouping said high dimensionality vectors for a plurality of documents in a high dimensional space.
-
7. The method of claim 1, wherein step (b) comprises the step of applying a clustering algorithm with primary emphasis on k-means and complete linkage hierarchical clustering to create a cluster centroid.
-
8. The method of claim 7, wherein said step of creating said cluster centroid is known as Fast Divisive Clustering and comprises the steps of:
-
(i) selecting a number of seeds;
(ii) placing said seeds in hyperspace by sampling regions to ensure a specified distribution of seeds;
(iii) identifying non-overlapping hyperspheres for each cluster and assigning each document to said each cluster based on which hypersphere said document is located;
(iv) calculating a centroid coordinate, representing the center of the mass for each cluster; and
(v) repeating steps (iii) and (iv) until centroid movement is less than a specified threshold.
-
-
9. The method of claim 1, wherein step (c) comprises, for small data sets, the steps of:
-
(1) applying a Multi-dimensional Scaling Algorithm to cluster centroid coordinates in hyperspace;
(2) producing a vector for each document with distance measures from said document to each cluster centroid; and
(3) constructing an operator matrix and multiplying said matrix by said vector to produce two-dimensional coordinates for said each document.
-
-
10. The method of claim 1, wherein step (c) comprises, for large data sets, the steps of:
-
(1) applying an Anchored Least Stress Algorithm to cluster centroid coordinates in hyperspace;
(2) producing a vector for each document with distance measures from said document to each cluster centroid; and
(3) constructing an operator matrix and multiplying said matrix by said vector to produce two-dimensional coordinates for said each document.
-
-
11. A method for representing human comprehensible information in a low-dimensionality space based on a high dimensionality analysis thereof, the information comprising sets of semantic information, comprising the steps of:
-
(a) representing the sets of semantic information as a vector in a high-dimensional information space;
(b) segmenting the information space into a plurality or bounded continuous sub-spaces, each having a centroid;
(c) projecting the segmented bounded continuous subspaces of the high-dimensional information space onto a low dimensional space, in a manner sensitive to a relation of each set of semantic information to each centroid. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. A computer readable medium storing program instructions for programming a general purpose computer to perform a method for representing human comprehensible information in a low-dimensionality space based on a high dimensionality analysis thereof, the information comprising sets of semantic information, comprising the steps of:
-
(a) representing the sets of semantic information as a vector in a high-dimensional information space;
(b) segmenting the information space into a plurality of bounded continuous sub-spaces, each having a centroid;
(c) projecting the segmented bounded continuous subspaces of the high-dimensional information space onto a low dimensional space, wherein said projection is sensitive to a relation of each set of semantic information to each centroid.
-
Specification