×

System and method for using an exemplar document to retrieve relevant documents from an inverted index of a large corpus

  • US 8,122,043 B2
  • Filed: 06/30/2009
  • Issued: 02/21/2012
  • Est. Priority Date: 06/30/2009
  • Status: Active Grant
First Claim
Patent Images

1. A method for ranking the relevance of each of a plurality of documents in a corpus to a search query of words comprising the steps of:

  • a) grouping words in the search query by synonym into one or more word groups, said grouping being performed by a processing unit;

    b) for each word group, counting the number of instances (the “

    FQ”

    value) that a word from the word group appears in the search query, said counting being performed by the processing unit;

    c) determining, by the processing unit, the maximum FQ value among all the word groups;

    d) calculating, by the processing unit, a scaling factor K;

    e) for each word group, calculating a term frequency (“

    TF”

    ) value by dividing the FQ value for the word group by the maximum FQ value and applying scaling factor K to the resulting quotient, said calculating being performed by the processing unit;

    f) for each word group, counting the number of documents (“

    FC”

    ) in the corpus that contain at least one word from the word group, said counting being performed by the processing unit;

    g) counting the number of documents (“

    N”

    ) in the corpus, said counting being performed by the processing unit;

    h) for each word group, calculating an inverse document frequency (“

    IDF”

    ) value by dividing N by FC, adding one to the resulting quotient, and taking the natural logarithm of the resulting sum, said calculating being performed by the processing unit;

    i) for each word group, calculating a TF-IDF value by multiplying said TF value by said IDF value, said calculating being performed by the processing unit; and

    j) ranking the relevance of each document in the corpus utilizing the TF-IDF values for the word groups in the search query, said ranking being performed by the processing unit.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×