×

Method and system for acoustic data selection for training the parameters of an acoustic model

  • US 9,972,306 B2
  • Filed: 08/05/2013
  • Issued: 05/15/2018
  • Est. Priority Date: 08/07/2012
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method for training acoustic models in an automatic speech recognition system through the selection of acoustic data comprising the steps of:

  • a. training a first acoustic model in the automatic speech recognition system using a training-data corpus comprising a plurality of speech audio files and a respective plurality of transcriptions for the plurality of speech audio files;

    b. performing a forced Viterbi alignment of the plurality of speech audio files using the trained first acoustic model in the automatic speech recognition system and determining an average frame likelihood score β

    r for each of the plurality of speech audio files;

    c. calculating a global frame likelihood score δ

    for the plurality of speech audio files, wherein the global frame likelihood score δ

    comprises an average of frame likelihoods over the entire corpus;

    d. performing a phoneme recognition of the plurality of speech audio files using the trained first acoustic model and the plurality of transcriptions in the automatic speech recognition system;

    e. calculating a phoneme recognition accuracy γ

    for each of the plurality of speech audio files and a global phoneme recognition accuracy v for the plurality of speech audio files;

    f. creating a subset training-data corpus comprising audio files retained from the plurality of speech audio files which meet at least one predetermined criterion indicating that an audio file has good audio quality, the at least one predetermined criterion comprising at least one criterion selected from the group comprising;

    a first criterion based on the average frame likelihood score β

    of the retained speech audio file and the global frame likelihood score δ

    ; and

    a second criterion based on the phoneme recognition accuracy γ

    of the retained speech audio file and the global phoneme recognition accuracy v; and

    g. training a second acoustic model in the automatic speech recognition system using the subset training-data corpus.

View all claims
  • 5 Assignments
Timeline View
Assignment View
    ×
    ×