×

System and method for building diverse language models

  • US 9,396,183 B2
  • Filed: 07/13/2015
  • Issued: 07/19/2016
  • Est. Priority Date: 03/08/2011
  • Status: Active Grant
First Claim
Patent Images

1. A method comprising:

  • establishing a website visitation policy according to a previous crawling cycle and vocabulary gaps in a language model, wherein the website visitation policy identifies, according to a pattern of links, a likelihood of web pages to have information capable of filling the vocabulary gaps, and wherein the website visitation policy comprises a crawling schedule according to perplexity of the web pages with respect to the language model;

    crawling, via a processor, the web-pages according to the crawling schedule, to yield new vocabulary words; and

    generating a diverse language model according to the language model and the new vocabulary words.

View all claims
  • 4 Assignments
Timeline View
Assignment View
    ×
    ×