Method for disambiguating multiple readings in language conversion

  • US 8,706,472 B2
  • Filed: 08/11/2011
  • Issued: 04/22/2014
  • Est. Priority Date: 08/11/2011
  • Status: Active Grant
  • ×
    • Pin Icon | RPX Insight
    • Pin
First Claim
Patent Images

1. A method, comprising:

  • at a device having one or more processors and memory;

    receiving input data to be converted into a symbolic representation of the input data in a target symbolic system, the symbolic representation comprising a set of characters in the target symbolic system;

    identifying a first candidate character for the symbolic representation based on a first portion of the input data, and a second candidate character for the symbolic representation based on a second portion of the input data, wherein the first candidate character has at least a first pronunciation and a second pronunciation each applicable to a respective usage context;

    generating a plurality of candidate character strings, including at least a first candidate string comprising at least the first candidate character and the second candidate character; and

    converting the input data to a selected one of the plurality of candidate character strings, said converting comprising;

    determining a respective probability that the first candidate character string is a correct symbolic representation of the input data using a language model that individually accounts for a respective usage probability of the first candidate character in a first usage context comprising the second candidate character in combination with the first pronunciation of the first candidate character, and not the second pronunciation of the first candidate character, and wherein the language model is trained on an annotated corpus that associates the first pronunciation with the first candidate character used in respective contexts comprising the second candidate character.

View all claims
  • 1 Assignment