×

Predicting pronunciations with word stress

  • US 10,255,905 B2
  • Filed: 06/10/2016
  • Issued: 04/09/2019
  • Est. Priority Date: 06/10/2016
  • Status: Active Grant
First Claim
Patent Images

1. A method performed by one or more computers of a text-to-speech synthesis system, the method comprising:

  • determining, by the one or more computers of the text-to-speech synthesis system, spelling data that indicates the spelling of a word;

    determining, by the one or more computers of the text-to-speech synthesis system, first pronunciation data that indicates at least one stress location for the word;

    providing, by the one or more computers of the text-to-speech synthesis system, the spelling data and the first pronunciation data as input to a trained recurrent neural network, the trained recurrent neural network being trained to indicate characteristics of word pronunciations based at least on data indicating the spelling of words;

    receiving, by the one or more computers of the text-to-speech synthesis system, output representing a stress pattern for pronunciation of the word, the output being generated by the trained recurrent neural network in response to providing the spelling data and the first pronunciation data as input;

    using, by the one or more computers of the text-to-speech synthesis system, the output of the trained recurrent neural network to generate second pronunciation data indicating a stress pattern for a pronunciation of the word, wherein the second pronunciation data is different from the first pronunciation data that indicates at least one stress location for the word;

    generating, using the second pronunciation data, audio data that includes a synthesized utterance of the word and applies stress to the word based on the stress pattern indicated by the second pronunciation data; and

    providing, by the one or more computers of the text-to-speech synthesis system, the audio data to a system that includes at least one speaker for audible presentation of the synthesized utterance of the word using the audio data.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×