Method and apparatus for labeling in steered visual analysis of collections of documents
First Claim
Patent Images
1. A method of labeling in steered visual analysis of a collection of documents, the method comprising:
- receiving a query against a database including a collection of documents;
representing contents of the query as a matrix;
rotating document vectors associated with respective documents to match the matrix to produce a matrix of rotated document vectors;
grouping the rotated document vectors into clusters; and
displaying a graphic around an area corresponding to a query term.
2 Assignments
0 Petitions
Accused Products
Abstract
A method of labeling in steered visual analysis of a collection of documents, the method comprising receiving a query against a database including a collection of documents; representing contents of the query as a matrix; rotating document vectors associated with respective documents to match the matrix to produce a matrix of rotated document vectors; grouping the rotated document vectors into clusters; and displaying a graphic around an area corresponding to a query term.
91 Citations
28 Claims
-
1. A method of labeling in steered visual analysis of a collection of documents, the method comprising:
-
receiving a query against a database including a collection of documents;
representing contents of the query as a matrix;
rotating document vectors associated with respective documents to match the matrix to produce a matrix of rotated document vectors;
grouping the rotated document vectors into clusters; and
displaying a graphic around an area corresponding to a query term. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computer readable medium bearing computer program code which, when loaded in a computer, causes the computer to:
-
receive a query against a database including a collection of documents;
represent contents of the query as a matrix;
rotate document vectors associated with respective documents to match the matrix to produce a matrix of rotated document vectors;
group the rotated document vectors into clusters; and
display a graphic around an area corresponding to a query term. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A method comprising:
-
semantically filtering a set of documents in a database to extract a set of semantic concepts, to improve an efficiency of a predictive relationship to its content, based on at least one of word frequency, overlap and topicality;
defining a topic set, the topic set being characterized as the set of semantic concepts which best discriminate the content of the documents containing them, the topic set being defined based on at least one of word frequency, overlap and topicality;
forming a matrix with the semantic concepts contained within the topic set defining one dimension of the matrix and the semantic concepts contained within the filtered set of documents comprising another dimension of the matrix;
calculating matrix entries as the conditional probability that a document in the database will contain each semantic concept in the topic set given that it contains each semantic concept in the filtered set of documents;
providing the matrix entries as document vectors to interpret the document contents of the database;
inputting query terms;
augmenting the topic set by the query terms;
making an incidence matrix of query terms for the documents;
rotating the document vectors to match the incidence matrix;
clustering and projecting the rotated document vectors; and
displaying a graphic around a cluster and labeling the graphic with a query term related to the cluster. - View Dependent Claims (14, 15, 16)
-
-
17. A computer readable medium bearing computer program code which, when loaded in a computer, causes the computer to:
-
semantically filter a set of documents in a database to extract a set of semantic concepts, to improve an efficiency of a predictive relationship to its content, based on at least one of word frequency, overlap and topicality;
define a topic set, the topic set being characterized as the set of semantic concepts which best discriminate the content of the documents containing them, the topic set being defined based on at least one of word frequency, overlap and topicality;
form a matrix with the semantic concepts contained within the topic set defining one dimension of the matrix and the semantic concepts contained within the filtered set of documents comprising another dimension of the matrix;
calculate matrix entries as the conditional probability that a document in the database will contain each semantic concept in the topic set given that it contains each semantic concept in the filtered set of documents;
provide the matrix entries as document vectors to interpret the document contents of the database;
input query terms;
augment the topic set by the query terms;
make an incidence matrix of query terms for the documents;
rotate the document vectors to match the incidence matrix;
cluster and project the rotated document vectors; and
display a graphic around a cluster and labeling the graphic with a query term related to the cluster. - View Dependent Claims (18, 19, 20)
-
-
21. A method comprising:
-
semantically filtering a set of documents in a database to extract a set of semantic concepts, to improve an efficiency of a predictive relationship to its content, based on at least one of word frequency, overlap and topicality;
defining a topic set, the topic set being characterized as the set of semantic concepts which best discriminate the content of the documents containing them, the topic set being defined based on at least one of word frequency, overlap and topicality;
forming a matrix with the semantic concepts contained within the topic set defining one dimension of the matrix and the semantic concepts contained within the filtered set of documents comprising another dimension of the matrix;
calculating matrix entries as the conditional probability that a document in the database will contain each semantic concept in the topic set given that it contains each semantic concept in the filtered set of documents;
providing the matrix entries as document vectors to interpret the document contents of the database;
inputting query terms;
augmenting the topic set by the query terms;
making an incidence matrix of query terms for the documents;
rotating the document vectors to match the incidence matrix;
clustering and projecting the rotated document vectors;
displaying labels for clusters; and
providing a user interface using which a user can adjust the influence of query terms in the labels. - View Dependent Claims (22, 23, 24)
-
-
25. A computer readable medium bearing computer program code which, when loaded in a computer, causes the computer to:
-
semantically filter a set of documents in a database to extract a set of semantic concepts, to improve an efficiency of a predictive relationship to its content, based on at least one of word frequency, overlap and topicality;
define a topic set, the topic set being characterized as the set of semantic concepts which best discriminate the content of the documents containing them, the topic set being defined based on at least one of word frequency, overlap and topicality;
form a matrix with the semantic concepts contained within the topic set defining one dimension of the matrix and the semantic concepts contained within the filtered set of documents comprising another dimension of the matrix;
calculate matrix entries as the conditional probability that a document in the database will contain each semantic concept in the topic set given that it contains each semantic concept in the filtered set of documents;
provide the matrix entries as document vectors to interpret the document contents of the database;
input query terms;
augment the topic set by the query terms;
make an incidence matrix of query terms for the documents;
rotate the document vectors to match the incidence matrix;
cluster and project the rotated document vectors;
display labels for clusters; and
provide a user interface using which a user can adjust the influence of query terms in the labels. - View Dependent Claims (26, 27, 28)
-
Specification