×

Phrase-based indexing in an information retrieval system

  • US 7,536,408 B2
  • Filed: 07/26/2004
  • Issued: 05/19/2009
  • Est. Priority Date: 07/26/2004
  • Status: Active Grant
First Claim
Patent Images

1. A method of indexing documents in a document collection, each document having an associated identifier, the method comprising:

  • providing a list of phrases;

    identifying, by operation of a processor adapted to manipulate data within a computer system, for a given document, phrases from the list of phrases that are present in the document;

    for each identified phrase in the document;

    identifying, by operation of a processor adapted to manipulate data within a computer system, a related phrase also present in the document, wherein for each phrase gj, gk is a related phrase of phrase gj where an information gain I of gk with respect to gj exceeds a predetermined threshold, the information gain I being a function of A(j,k) and E(j,k), where A(j,k) is a measure of an actual co-occurrence rate of gj and gk, and E(j,k) is an expected co-occurrence rate gj and gk; and

    indexing, by operation of a processor adapted to manipulate data within a computer system, the document by storing the identifier of the document and an indication of each related phrase gk also present in the document, in a posting list of the identified phrase gj.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×