×

Acoustic model training

  • US 9,495,955 B1
  • Filed: 01/02/2013
  • Issued: 11/15/2016
  • Est. Priority Date: 01/02/2013
  • Status: Active Grant
First Claim
Patent Images

1. An acoustic modeling system, comprising:

  • under control of one or more computing devices configured with specific computer-executable instructions,receiving a plurality of characteristics of utterances to be used to create an acoustic model;

    for each characteristic in the plurality of characteristics;

    identifying an utterance within a corpus of utterances having the characteristic; and

    associating at least a portion of the utterance with a tag indicative of the characteristic;

    receiving an identification of a desired training utterance, wherein the desired training utterance comprises a first portion associated with a first desired characteristic and a second portion associated with a second desired characteristic, and wherein the desired training utterance is not included in the corpus;

    selecting, from the corpus, a first utterance,wherein a portion of the first utterance comprises at least the first portion of the desired training utterance, andwherein the portion of the first utterance is associated with a tag corresponding to the first desired characteristic;

    extracting the portion of the first utterance from the first utterance;

    selecting, from the corpus, a second utterance,wherein a portion of the second utterance comprises at least the second portion of the desired training utterance, andwherein the portion of the second utterance is associated with a tag corresponding to the second desired characteristic;

    extracting the portion of the second utterance from the second utterance;

    concatenating the portion of the first utterance with the portion of the second utterance to generate the desired training utterance; and

    training an acoustic model, wherein;

    the acoustic model comprises statistical representations of possible sounds of subword units; and

    the statistical representations are generated based on a comparison between audio data associated with the desired training utterance that is generated and a textual transcription of the desired training utterance that is generated.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×