×

System and method for building diverse language models

  • US 9,081,760 B2
  • Filed: 03/08/2011
  • Issued: 07/14/2015
  • Est. Priority Date: 03/08/2011
  • Status: Active Grant
First Claim
Patent Images

1. A method comprising:

  • identifying vocabulary gaps in a current language model;

    establishing a visitation policy based on a previous crawling cycle and the vocabulary gaps, wherein the visitation policy identifies web pages likely to have information capable of filling the vocabulary gaps in the current language model, and wherein the visitation policy comprises a crawling schedule based on predicted perplexity of the web pages with respect to the current language model;

    crawling, via a crawler operating on a computing device, the web-pages according to the crawling schedule, to yield new vocabulary words; and

    generating a diverse language model based on the current language model and the new vocabulary words.

View all claims
  • 4 Assignments
Timeline View
Assignment View
    ×
    ×