×

Method and system for building text-to-speech voice from diverse recordings

  • US 9,542,927 B2
  • Filed: 11/13/2014
  • Issued: 01/10/2017
  • Est. Priority Date: 11/13/2014
  • Status: Active Grant
First Claim
Patent Images

1. A method comprising:

  • extracting speech features from a plurality of recorded reference speech utterances of a reference speaker to generate a reference set of reference-speaker vectors;

    for each respective plurality of recorded colloquial speech utterances of a respective colloquial speaker of multiple colloquial speakers, extracting speech features from the recorded colloquial speech utterances of the respective colloquial speaker to generate a respective set of colloquial-speaker vectors;

    for each respective set of colloquial-speaker vectors, replacing each colloquial-speaker vector of the respective set of colloquial-speaker vectors with a respective, optimally-matched reference-speaker vector from among the reference set of reference-speaker vectors, the respective, optimally-matched reference-speaker vector being identified by matching under a transform that compensates for differences in speech between the reference speaker and the respective colloquial speaker;

    aggregating the replaced colloquial-speaker vectors of all the respective sets of colloquial-speaker vectors into an aggregate set of conditioned speaker vectors;

    providing the aggregate set of conditioned speaker vectors to a text-to-speech (TTS) system implemented on one or more computing devices; and

    training the TTS system using the provided aggregate set of conditioned speaker vectors.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×