×

Deep networks for unit selection speech synthesis

  • US 9,460,704 B2
  • Filed: 09/06/2013
  • Issued: 10/04/2016
  • Est. Priority Date: 09/06/2013
  • Status: Active Grant
First Claim
Patent Images

1. A method comprising:

  • obtaining a set of phones that is associated with text that is to be synthesized into speech;

    accessing a neural network that has been trained to estimate a set of target acoustic features that represent a close acoustic match to a given set of phones;

    providing a particular set of phones for input to the neural network;

    receiving, from the neural network, a particular set of target acoustic features that represents the acoustic match to the particular set of phones;

    determining a distance between (i) the particular set of target acoustic features that the neural network indicates represents the acoustic match to the particular set of phones and (ii) a set of acoustic features that is associated with a stored acoustic sample;

    selecting the acoustic sample to be used in synthesizing the text into speech based at least on the determined distance between (i) the particular set of target acoustic features that the neural network indicates represents the acoustic match to the particular set of phones and (ii) the set of acoustic features that is associated with the stored acoustic sample;

    synthesizing, using an automated speech synthesizer, the text into speech using the selected acoustic sample; and

    providing the speech for output.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×