×

Speech recognition using topic-specific language models

  • US 9,324,323 B1
  • Filed: 12/14/2012
  • Issued: 04/26/2016
  • Est. Priority Date: 01/13/2012
  • Status: Active Grant
First Claim
Patent Images

1. A method comprising:

  • receiving audio;

    determining, based at least on comparing a representation of one or more features of the audio to a set of representations of one or more corresponding features of other items of content, a proximity in a vector space of the representation of the one or more features of the audio to each of the representations of one or more corresponding features of other items of content, wherein each of the representations of one or more corresponding features of other items of content is associated with two or more language models that are each associated with a different topic;

    determining, based at least on the proximities in the vector space of the representation of the one or more features of the audio to the representations of one or more corresponding features of other items of content, that the representation of the one or more features of the audio is proximate to a representation of one or more corresponding features of another item of content;

    identifying (i) the language models that are associated with the representation of the one or more corresponding features of the other item of content that is indicated as proximate to the representation of the one or more features of the audio, and, (ii) for each language model that is associated with the representation of the one or more corresponding features of the other item of content, a relevance of the topic associated with the language model to the other item of content;

    obtaining, for each of the language models that are associated with the representation of the one or more corresponding features of the other item of content that is indicated as proximate to the representation of the one or more features of the audio, (i) a transcription of the audio, and (ii) a speech recognizer confidence score;

    generating, for each transcription, an aggregated score based at least on (i) the speech recognizer confidence score for the transcription, (ii) the relevance of the topic associated with the language model for which the transcription was obtained to the other item of content, and (iii) the proximity of the representation of the one or more features of the audio to the representation of the one or more corresponding features of the other item of content; and

    selecting a particular transcription of the audio, from among the transcriptions, based at least on the aggregated scores.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×