Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks

  • US 9,697,820 B2
  • Filed: 12/07/2015
  • Issued: 07/04/2017
  • Est. Priority Date: 09/24/2015
  • Status: Active Grant
  • ×
    • Pin Icon | RPX Insight
    • Pin
First Claim
Patent Images

1. A non-transitory computer-readable storage medium storing one or more programs, the one or more programs comprising instructions which, when executed by one or more processors of an electronic device, cause the electronic device to:

  • receive text to be converted to speech;

    generate a sequence of target units representing a spoken pronunciation of the text;

    select, from a plurality of speech segments, a first candidate speech segment for a first target unit of the sequence of target units and a second candidate speech segment for a second target unit of the sequence of target units;

    determine, using a set of acoustic features of the first candidate speech segment and a set of linguistic features of the second target unit, a set of predicted acoustic model parameters of the second target unit;

    determine, using the set of predicted acoustic model parameters of the second target unit and a set of acoustic features of the second candidate speech segment, a likelihood score of the second candidate speech segment with respect to the first candidate speech segment;

    select the second candidate speech segment to be used in speech synthesis based on the determined likelihood score; and

    generate speech corresponding to the received text using the second candidate speech segment.

View all claims