×

Computer aided document retrieval

  • US 7,747,593 B2
  • Filed: 09/27/2004
  • Issued: 06/29/2010
  • Est. Priority Date: 09/26/2003
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method of determining cluster attractors for use in clustering a plurality of documents, each document comprising at least one term, each term comprising one or more words, the method comprising:

  • causing a computer to calculate, in respect of each term, a probability distribution that is indicative ofin the instance where a document comprises said term and said one other term that co-occurs with said term in at least one of said documents, the frequency of occurrence of said one other term, andin the instance where a document comprises said term and more than one other term that co-occurs with said term in at least one of said documents, the respective frequency of occurrence of each other term, that co-occurs with said term in at least one of said documents;

    causing a computer to calculate, in respect of each term, the entropy of the respective probability distribution; and

    causing the computer to select at least one of said probability distributions as a cluster attractor depending on the respective entropy value;

    wherein the selected cluster attractor is a clustering focus for at least some of said documents, and wherein said probability distribution is calculated as p

    ( y

    z
    )
    =



    x





    X

    ( z )




    tf

    ( x , y )


    x





    X

    ( z )
    , t





    Y


    tf

    ( x , t )
    where tf(x, y) is a term frequency of a term y in a document x and X(z) is a set of all documents of said plurality of documents that contain a term z and where t is a term index.

View all claims
  • 8 Assignments
Timeline View
Assignment View
    ×
    ×