×

Retrieval of documents using language models

  • US 8,401,841 B2
  • Filed: 08/30/2007
  • Issued: 03/19/2013
  • Est. Priority Date: 08/31/2006
  • Status: Active Grant
First Claim
Patent Images

1. A method of modeling documents implemented by a computing device comprising:

  • receiving a plurality of documents and building a language model, the building comprising, for each of the documents,tokenizing text included in the document;

    defining paragraphs by identifying paragraph boundaries in the tokenized text;

    identifying word pairs in each defined paragraph wherein the word pairs comprise two words occurring in any location in the same defined paragraph, including adjacent to one another;

    calculating the frequency of the identified word pairs; and

    adding the identified word pairs and corresponding frequency information to the language model.

View all claims
  • 11 Assignments
Timeline View
Assignment View
    ×
    ×