×

Method for indexing for retrieving documents using particles

  • US 8,229,921 B2
  • Filed: 02/25/2008
  • Issued: 07/24/2012
  • Est. Priority Date: 02/25/2008
  • Status: Expired due to Fees
First Claim
Patent Images

1. A computer implemented method for indexing and retrieving documents in a database, comprising the steps of:

  • constructing a set of particles and a simultaneously optimized particle-based language model using training documents, and in which a perplexity of the particle-based language model is at least ten times lower than the perplexity of a word-based language model constructed from the same training documents, wherein the set of particles applies expectation maximization to an objective function, and where the objective function considers any combination of;

    a size of the set of particles;

    errors in representing all documents in a document training set and a query training set;

    a retrieval accuracy of using the set of particles;

    an entropy of a statistical models that represent the set of particles; and

    a particle-level language model derived from the documents and the queries in the training sets;

    converting each document in a collection of documents to a document particle graph, the document particle graph including particles selected from the set of the particles;

    extracting, for each document, a set of document keys from the corresponding particle graph;

    storing the document keys for each document in an index to a database storing the collection of documents;

    converting a query to a query particle graph including a set of query particles, the query graph including particles selected from the set of the particles;

    extracting a set of query keys from the query particle graph;

    retrieving relevant documents from the database according to the query keys and the document keys stored in the index; and

    outputting the relevant documents to a user.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×