×

Named entity variations for multimodal understanding systems

  • US 9,916,301 B2
  • Filed: 12/21/2012
  • Issued: 03/13/2018
  • Est. Priority Date: 12/21/2012
  • Status: Active Grant
First Claim
Patent Images

1. A method for determining variations for named entities, comprising:

  • accessing a named entity list comprising a canonical phrase for an entry in the named entity list;

    determining a candidate variation for the canonical phrase, wherein the candidate variation is an alternative phrase for the canonical phrase;

    determining, using click log data associated with a seed entity list of entities that are of a same type as the canonical phrase, a set of related websites from a plurality of websites, wherein the set of related websites is determined by mining the click log data to identify links that are most selected for queries corresponding with the seed entity list, wherein the mining of the click data includes determining a ratio of clicks received for a particular website to clicks received for queries in the seed entity list;

    obtaining a likelihood ratio for a website in the set of related websites, wherein the likelihood ratio indicates a probability of a click on the particular website in response to a query from the seed entity list versus a probability of click on the particular website in response to a random query;

    generating a score for the candidate variation based on evaluation of a distribution of click data for the candidate variation, wherein the score for the candidate variation is a weighted click vote over the web sites clicked for the candidate variation and is generated as a sum over all websites clicked in response to a query comprising the candidate variation, where each website is weighted by the likelihood ratio pertaining to the particular website;

    determining whether to include the candidate variation in a language understanding model based on the score; and

    training the language understanding model by updating the named entity list to include the candidate variation for the canonical phrase based on the score.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×