Computer aided document retrieval
First Claim
1. A method of determining cluster attractors for a plurality of documents, each document comprising at least one term, the method comprising:
- calculating, in respect of each term, a probability distribution indicative of the frequency of occurrence of the, or each, other term that co-occurs with said term in at least one of said documents;
calculating, in respect of each term, the entropy of the respective probability distribution;
selecting at least one of said probability distributions as a cluster attractor depending on the respective entropy value.
8 Assignments
0 Petitions
Accused Products
Abstract
A method of determining cluster attractors for a plurality of documents comprising at least one term. The method comprises calculating, in respect of each term, a probability distribution indicative of the frequency of occurrence of the, or each, other term that co-occurs with said term in at least one of said documents. Then, the entropy of the respective probability distribution is calculated. Finally, at least one of said probability distributions is selected as a cluster attractor depending on the respective entropy value. The method facilitates very small clusters to be formed enabling more focused retrieval during a document search.
62 Citations
18 Claims
-
1. A method of determining cluster attractors for a plurality of documents, each document comprising at least one term, the method comprising:
- calculating, in respect of each term, a probability distribution indicative of the frequency of occurrence of the, or each, other term that co-occurs with said term in at least one of said documents;
calculating, in respect of each term, the entropy of the respective probability distribution;
selecting at least one of said probability distributions as a cluster attractor depending on the respective entropy value. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 16, 17, 18)
- calculating, in respect of each term, a probability distribution indicative of the frequency of occurrence of the, or each, other term that co-occurs with said term in at least one of said documents;
-
14. An apparatus for determining cluster attractors for a plurality of documents, each document comprising at least one term, the apparatus comprising:
- means for calculating, in respect of each term, a probability distribution indicative of the frequency of occurrence of the, or each, other term that co-occurs with said term in at least one of said documents;
means for calculating, in respect of each term, the entropy of the respective probability distribution; and
means for selecting at least one of said probability distributions as a cluster attractor depending on the respective entropy value.
- means for calculating, in respect of each term, a probability distribution indicative of the frequency of occurrence of the, or each, other term that co-occurs with said term in at least one of said documents;
Specification