Methods and apparatus for steering the analyses of collections of documents
First Claim
1. A method of steering the analysis of a collection of documents, comprising:
- receiving query terms for use in querying a database including a collection of documents;
representing at least some of the query terms in a matrix;
rotating document vectors associated with the documents to match the matrix to produce a matrix of rotated document vectors, each document vector representing a numeric vector created in association with individual documents;
grouping the rotated document vectors into clusters, each cluster having one or more documents; and
projecting the clusters to display visual information of the documents, the visual information including a summary view of the collection of documents.
2 Assignments
0 Petitions
Accused Products
Abstract
A method for steering the analysis of a collection of documents includes receiving query terms for use in querying a database including a collection of documents; representing at least some of the query terms in a matrix; rotating document vectors associated with the documents to match the matrix to produce a matrix of rotated document vectors, each document vector representing a numeric vector created in association with individual documents; grouping the rotated document vectors into clusters, each cluster having one or more documents; and projecting the clusters to display visual information of the documents, the visual information including a summary view of the collection of documents. Program code and a system are also provided.
-
Citations
81 Claims
-
1. A method of steering the analysis of a collection of documents, comprising:
-
receiving query terms for use in querying a database including a collection of documents;
representing at least some of the query terms in a matrix;
rotating document vectors associated with the documents to match the matrix to produce a matrix of rotated document vectors, each document vector representing a numeric vector created in association with individual documents;
grouping the rotated document vectors into clusters, each cluster having one or more documents; and
projecting the clusters to display visual information of the documents, the visual information including a summary view of the collection of documents. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A method of steering the analysis of a collection of documents, comprising:
-
receiving a query against a database;
obtaining a query result set having a collection of documents;
grouping the collection of documents into a classification to produce a plurality of clusters, each cluster having a set of documents from the collection of documents, the grouping of the collection of documents into the clusters being based on contents of the query; and
displaying the clusters to display visual information of the collection of documents. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
-
-
26. A computer-readable medium comprising computer program code which, when loaded in a computer, causes the computer, in operation, to:
-
receive a query against a database;
obtain a query result set having a collection of documents;
group the collection of documents into a classification to produce a plurality of clusters, each cluster having a set of documents from the collection of documents, the grouping of the collection of documents into the clusters being based on contents of the query; and
display the clusters to display visual information of the collection of documents. - View Dependent Claims (27, 28, 29, 30, 31, 32, 33, 34, 35, 36)
-
-
37. An information analysis and steering method, comprising:
-
receiving an information collection including information objects, each information object having a descriptive vector;
associating the information object with an indicator vector, the indicator vector having a plurality of vector coordinates;
labeling each of the plurality of vector coordinates with contents of a query that is used to produce the information collection; and
projecting the information collection as clusters, the clusters including the descriptive vectors and contents of the indicator vectors. - View Dependent Claims (38, 39, 40, 41, 42, 43, 44)
-
-
45. An information analysis and steering system comprising a computer server configured to:
-
receive an information collection including information objects, each information object having a descriptive vector;
associate the information object with an indicator vector, the indicator vector having a plurality of vector coordinates;
label each of the plurality of vector coordinates with contents of a query that is used to produce the information collection; and
project the information collection as clusters, the clusters including the descriptive vectors and contents of the indicator vectors. - View Dependent Claims (46, 47, 48, 49, 50, 51, 52)
-
-
53. A method of steering the analysis of a collection of documents, comprising:
-
receiving a collection of documents, the collection being produced by a query against a database;
creating a numeric vector for each document of the collection;
encoding the query to create an incidence matrix;
rotating the numeric vectors to match the incidence matrix;
grouping the rotated numeric vectors into clusters; and
projecting the clusters to create a summary view of the documents. - View Dependent Claims (54, 55, 56, 57)
-
-
58. A computer readable medium embodying computer program code which, when loaded in a computer, causes the computer, in operation, to:
-
represent contents of a query, used to retrieve a collection of documents, as a matrix;
rotate document vectors associated with the documents to match the matrix to produce a matrix of rotated document vectors;
group the rotated document vectors into clusters; and
project the clusters to display visual information of the documents. - View Dependent Claims (59, 60, 61, 62, 63, 64, 65, 66)
-
-
67. A method of representing information objects in a concept-space, comprising:
-
receiving a query against a database;
obtaining a query result set having a collection of information objects from the database, the collection of information objects related to one or more concepts;
grouping the collection of information objects into an unsupervised classification to produce a plurality of clusters, each cluster having a set of information objects from the collection, the grouping being performed based on the one or more concepts; and
projecting the clusters to display visual information of the collection of information objects, each of the clusters identifying a concept, each cluster includes information objects related to the concept identified by the cluster. - View Dependent Claims (68, 69, 70, 71, 72)
-
-
73. A method of steering the analysis of a collection of information objects, comprising:
-
receiving a collection of information objects, the information objects representing one or more concepts;
grouping the collection of information objects into a plurality of clusters, each cluster representing a single concept and having a set of information objects from the collection; and
projecting the clusters to display visual information of the collection of information objects. - View Dependent Claims (74, 75, 76, 77)
-
-
78. A computer-readable medium comprising computer usable-code, when loaded in a computer, causes the computer, in operation to:
-
receive a collection of information objects, the information objects representing one or more concepts;
group the collection of information objects into a plurality of clusters, each cluster representing a single concept and having a set of information objects from the collection; and
project the clusters to display visual information of the collection of information objects. - View Dependent Claims (79, 80)
-
-
81. A method comprising:
-
semantically filtering a set of documents in a database to extract a set of semantic concepts, to improve an efficiency of a predictive relationship to its content, based on at least one of word frequency, overlap and topicality;
defining a topic set, said topic set being characterized as the set of semantic concepts which best discriminate the content of the documents containing them, said topic set being defined based on at least one of word frequency, overlap and topicality;
forming a matrix with the semantic concepts contained within the topic set defining one dimension of said matrix and the semantic concepts contained within the filtered set of documents comprising another dimension of said matrix;
calculating matrix entries as the conditional probability that a document in the database will contain each semantic concept in the topic set given that it contains each semantic concept in the filtered set of documents;
providing the matrix entries as document vectors to interpret the document contents of the database;
inputting query terms;
augmenting the topic set by the query terms;
making an incidence matrix of query terms for the documents;
rotating the document vectors to match the incidence matrix; and
clustering and projecting the rotated document vectors.
-
Specification