Analyzing the Ability to Find Textual Content
First Claim
1. A method for analyzing a document set, comprising:
- providing a document set;
determining a set of terms from the terms of the document set that minimizes a distance measurement from the given document set.
2 Assignments
0 Petitions
Accused Products
Abstract
A method and system for analyzing a document set (202, 420) are provided. The method includes determining a set of terms (312) from the terms of the document set that minimizes a distance measurement (405) from the given set of documents (420). The method includes using a greedy algorithm to build the set of terms incrementally, at each stage finding a single word that is closest to the document set (202, 420). The set of terms is evaluated to assess the ability to find the document set (202, 420). The set of terms are compared with expected terms to evaluate the ability to find the document set (202, 420). A measure of the ability to find a document set (202, 420) is provided by computing a distance measure (403) between a document set and an entire collection.
-
Citations
20 Claims
-
1. A method for analyzing a document set, comprising:
-
providing a document set; determining a set of terms from the terms of the document set that minimizes a distance measurement from the given document set. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A method for analyzing a document set, comprising:
-
computing a distance measure between a document set and an entire collection; and using the distance measure as a prediction of the ability to find the document set in the collection. - View Dependent Claims (15, 16)
-
-
17. A computer program product stored on a computer readable storage medium, comprising computer readable program code means for performing the steps of:
determining a set of terms from the terms of the document set that minimizes a distance measurement from the given document set.
-
18. A system for analyzing a document set, comprising:
-
a server computer providing a document set for access by client computers across a network; and a document set analyzer for determining a set of terms from the terms of the document set that minimizes a distance measurement from the given document set. - View Dependent Claims (19, 20)
-
Specification