×

Low latency real-time vocal tract length normalization

  • US 8,909,527 B2
  • Filed: 06/24/2009
  • Issued: 12/09/2014
  • Est. Priority Date: 01/12/2005
  • Status: Active Grant
First Claim
Patent Images

1. A method comprising:

  • separating training data into speaker specific segments;

    performing, for a speaker specific segment of the speaker specific segments;

    generating spectral data representative of the speaker specific segment, the spectral data comprising a plurality of warping factors;

    selecting a first warping factor as a best warping factor from the plurality of warping factors based on a determination made during speech recognition of the speaker specific segment, and generating a warped spectral data representation of the spectral data using the first warping factor;

    comparing the warped spectral data representation to a vocal tract length normalized acoustic model;

    iteratively carrying out, until a comparison indicates a warping factor difference below 0.02, the acts of;

    selecting an other warping factor and generating an other warped spectral data representation;

    comparing the other warped spectral data representation to the vocal tract length normalized acoustic model, to yield the comparison; and

    when the other warping factor produces a closer match to the vocal tract length normalized acoustic model, saving the other warping factor as a best warping factor for the speaker specific segment;

    training a new acoustic model using a warped spectral data representation of all the training data that is generated using the best warping factor for each of the speaker specific segments;

    selecting the new acoustic model as the vocal tract length normalized acoustic model; and

    repeating the steps of performing and selecting until the best warping factor for each of the speaker specific segments is stable.

View all claims
  • 4 Assignments
Timeline View
Assignment View
    ×
    ×