Generating representative exemplars for indexing, clustering, categorization and taxonomy
First Claim
Patent Images
1. A method for automatically selecting exemplary documents from a collection of documents, comprising:
- generating a representation of each document in the collection of documents in an abstract mathematical space;
measuring a similarity between the representation of each document in the collection of documents and the representation of at least one other document in the collection of documents;
identifying clusters of conceptually similar documents based on the similarity measurements; and
identifying at least one exemplary document within each cluster.
1 Assignment
0 Petitions
Accused Products
Abstract
A method for automatically selecting representative exemplars from a collection of documents. The method includes generating a representation of each document in the collection of documents in an abstract mathematical space, measuring a similarity between the representation of each document in the collection of documents and the representation of at least one other document in the collection of documents, identifying clusters of conceptually similar documents based on the similarity measurements, and identifying at least one exemplary document within each cluster.
-
Citations
20 Claims
-
1. A method for automatically selecting exemplary documents from a collection of documents, comprising:
-
generating a representation of each document in the collection of documents in an abstract mathematical space;
measuring a similarity between the representation of each document in the collection of documents and the representation of at least one other document in the collection of documents;
identifying clusters of conceptually similar documents based on the similarity measurements; and
identifying at least one exemplary document within each cluster. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer program product for automatically selecting exemplary documents from a collection of documents, comprising:
-
a computer usable medium having computer readable program code means embodied in said medium for causing an application program to execute on an operating system of a computer, said computer readable program code means comprising;
a computer readable first program code means for generating a representation of each document in the collection of documents in an abstract mathematical space;
a computer readable second program code means for measuring a similarity between the representation of each document in the collection of documents and the representation of at least one other document in the collection of documents;
a computer readable third program code means for identifying clusters of conceptually similar documents based on the similarity measurements; and
a computer readable fourth program code means for identifying at least one exemplary document within each cluster. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A computer-based method for automatically reducing a number of data objects that represent information included in a collection of data objects, comprising:
-
generating a representation of each data object in the collection of data objects in an abstract mathematical space;
measuring a similarity between the representation of each data object in the collection of data objects and the representation of at least one other data object in the collection of data objects;
identifying clusters of conceptually similar data objects based on the similarity measurements, wherein a number of data objects in each cluster is determined based on an adjustable clustering threshold; and
identifying at least one exemplary data object within each cluster, wherein a number of identified exemplary data objects is less than a number of data objects in the collection of data objects. - View Dependent Claims (18, 19, 20)
-
Specification