×

Method and apparatus for converting text into audible signals using a neural network

  • US 5,668,926 A
  • Filed: 03/22/1996
  • Issued: 09/16/1997
  • Est. Priority Date: 04/28/1994
  • Status: Expired due to Fees
First Claim
Patent Images

1. A method for training and utilizing a neural network that is used to convert text streams into audible signals, the method comprising the steps of:

  • wherein training a neural network utilizes the steps of;

    1a) inputting recorded audio messages;

    1b) dividing the recorded audio messages into a series of audio frames, wherein each audio frame has a fixed duration;

    1c) assigning, for each audio frame of the series of audio frames, a phonetic representation of a plurality of phonetic representations that include articulation characteristics;

    1d) generating a context description of a plurality of context descriptions for each audio frame based on the phonetic representation of the each audio frame and the phonetic representation of at least some other audio frames of the series of audio frames, generating syntactic boundary information based on the phonetic representation of the audio frame and the phonetic representation of at least some other audio frames of the series of audio frames, generating phonetic boundary information based on the phonetic representation of the audio frame and the phonetic representation of at least some other audio frames of the series of audio frames, and generating a description of prominence of syntactic information based on the phonetic representation of the audio frame and the phonetic representation of at least some other audio frames of the series of audio frames;

    1e) assigning, for the each audio frame, a target acoustic representation of a plurality of acoustic representations;

    1f) training a feed-forward neural network with a recurrent input structure to associate an acoustic representation of the plurality of acoustic representations with the context description of the each audio frame, wherein the acoustic representation substantially matches the target acoustic representation;

    wherein upon receiving a text stream, converting the text stream into an audible signal utilizing the steps of;

    1g) converting the text stream into a series of phonetic frames, wherein a phonetic frame of the series of phonetic frames includes one of the plurality of phonetic representations, and wherein a phonetic frame has the fixed duration;

    1h) assigning one of the plurality of context descriptions to the phonetic frame based on the one of the plurality of phonetic representations and phonetic representations of at least some other phonetic frames of the series of phonetic frames;

    1i) converting, by the neural network, the phonetic frame into one of the plurality of acoustic representations, based on the one of the plurality of context descriptions; and

    1j) converting the one of the plurality of acoustic representations into an audible signal.

View all claims
  • 0 Assignments
Timeline View
Assignment View
    ×
    ×