Apparatus, methods, and programming for speech synthesis via bit manipulations of compressed database

US 20040073428A1
Filed: 10/10/2002
Published: 04/15/2004
Est. Priority Date: 10/10/2002
Status: Abandoned Application

First Claim

Patent Images

1. A method of performing text-to-speech synthesis comprising:

storing a plurality of encoded speech snippets, each including a sequence of one or more encoded sound representations produced by linear predictive encoding of speech sounds corresponding to a sequence of one or more phonemes, where a plurality of said snippets correspond to sequences of phonemes that are shorter than any of the words in which such a sequence of phonemes occurs;

storing a desired phonetic representation, indicating a sequence of phonemes to be generated as speech sounds; and

storing a desired pitch contour, indicating which of different possible pitch values are to be used in the generation of the speech sounds of different phonemes in the phonetic representation;

selecting from said stored snippets a sequence of such snippets that correspond to the sequence of phonemes in the phonetic representation and concatenating those snippets into a synthesized sequence of such snippets;

altering the encoded representations associated with one or more of the selected snippets associated with said synthesized sequence to cause the pitch values of the speech sounds represented by each such encoded representation to more closely match the pitch values indicated for the selected snippet'"'"'s corresponding one or more phonemes in the pitch contour; and

using a linear predictive decoder to convert the synthesized sequences of snippets, including said altered snippets, into a waveform signal representing a sequence of speech sound corresponding to the phonetic representation and the pitch contour.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Text-to-speech synthesis modifies the pitch of the sounds it concatenates to generate speech, when such sounds are in compressed, coded form, so as to make them sound better together. The pitch, duration, and energy of such concatenated sounds can be altered to better match, respectively, pitch, duration, and/or energy contours generated from phonetic spelling of the speech to be synthesized, which can, in turn, be derived from the text to be synthesized. The synthesized speech can be generated from the encoded sound of sub-word snippets as well as of one or more whole words. The duration of concatenated sounds can be changed by inserting or deleting sound frames associated with individual snippets. Such text-to-speech can be used to say words recognized by speech recognition, such as to provide feedback on the recognition. Such text-to-speech synthesis can be used in portable devices such as cellphones, PDAs, and/or wrist phones.

Citations

13 Claims

1. A method of performing text-to-speech synthesis comprising:
- storing a plurality of encoded speech snippets, each including a sequence of one or more encoded sound representations produced by linear predictive encoding of speech sounds corresponding to a sequence of one or more phonemes, where a plurality of said snippets correspond to sequences of phonemes that are shorter than any of the words in which such a sequence of phonemes occurs;
  
  storing a desired phonetic representation, indicating a sequence of phonemes to be generated as speech sounds; and
  
  storing a desired pitch contour, indicating which of different possible pitch values are to be used in the generation of the speech sounds of different phonemes in the phonetic representation;
  
  selecting from said stored snippets a sequence of such snippets that correspond to the sequence of phonemes in the phonetic representation and concatenating those snippets into a synthesized sequence of such snippets;
  
  altering the encoded representations associated with one or more of the selected snippets associated with said synthesized sequence to cause the pitch values of the speech sounds represented by each such encoded representation to more closely match the pitch values indicated for the selected snippet'"'"'s corresponding one or more phonemes in the pitch contour; and
  
  using a linear predictive decoder to convert the synthesized sequences of snippets, including said altered snippets, into a waveform signal representing a sequence of speech sound corresponding to the phonetic representation and the pitch contour.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. A method as in claim 1 further including generating said desired pitch contour from said desired phonetic representation or the text from which that phonetic representation corresponds before temporarily storing said pitch contour.
  - 3. A method as in claim 1:
    - further including receiving a sequence of one or more words for which corresponding speech sounds are to be generated;
      
      generating said desired phonetic representation as a sequence of one or more phonemes selected as probably representing the speech sounds associated with said received word sequence; and
      
      generating said desired pitch contour from said desired phonetic representation before temporarily storing said pitch contour.
  - 4. A method as in claim 1:
    - further including;
      
      storing a plurality of encoded word snippets, each including a sequence of one or more encoded sound representations produced by linear predictive encoding of speech sounds corresponding to one or more whole words;
      
      creating a sequence of speech sounds corresponding to a combination of encoded word snippets and said synthesized sequence of encoded snippets;
      
      said using of the linear predictive decoder to convert the synthesized sequence of snippets includes converting both the synthesized sequence of snippets and the word snippets into corresponding speech sounds.
  - 5. A method as in claim 1:
    - further including storing a desired duration contour, indicating which of different possible durations are to be used in the generation of the speech sounds of different phonemes in the phonetic representation; and
      
      wherein said altering of the encoded representations of snippets includes altering such encoded representations to cause the duration of the speech sounds represented by each of the encoded representations to more closely match the duration indicated for the corresponding phonemes in the duration contour.
  - 6. A method as in claim 5 further including generating a duration contour as an indication of the different possible durations to be used in the generation of the speech sound of the different phonemes in the phonetic representations,
  - 7. A method as in claim 5 wherein:
    - said encoded representation represents speech sounds includes a sequence of frames, each of which represents a speech sound during a period of time; and
      
      said altering of encoded representations to alter the duration of the speech sounds of encoded representations includes the insertion or deletion said frames from
  - 8. A method as in claim 1:
    - further including storing a desired energy contour, indicating which of different possible energy levels are to be used in the generation of the speech sounds of different phonemes in the phonetic representation; and
      
      wherein said altering of the encoded representations of snippets includes altering such encoded representations to cause the energy level of the speech sounds each of them represents to more closely match the energy values indicated for the corresponding phonemes in the pitch contour.
  - 9. A method as in claim 8:
    - further including generating an energy contour as an indication of the different possible energy values to be used in the generation of the speech sound of the different phonemes in the phonetic representations;
      
      wherein said altering of encoded representations associated with snippets associated with the synthesized sequence also includes altering said encoded representations to cause the energy values of the speech sounds each such encoded representation represents to more closely match the energy values indicated for the corresponding phonemes in the energy contour.
  - 10. A method as in claim 1 wherein said method is performed on a cellphone.
  - 11. A method as in claim 1 further including:
    - receiving sound corresponding to an utterance to be recognized;
      
      generating an electronic representation of the utterance;
      
      performing speech recognition against said electronic representation of the utterance to select as recognized one or more words as most likely to correspond to said utterance; and
      
      responding to the selection of said recognized words by causing the desired phonetic representation used to select the snippets that are converted into the waveform signal to be a phonetic representation corresponding to said one or more recognized words.
  - 12. A method as in claim 1 wherein said method is performed on a personal digital assistant.
  - 13. A method as in claim 1 wherein said method is performed on a wrist phone.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Voice Signal Technologies Incorporated (Microsoft Corporation)
Original Assignee
Voice Signal Technologies Incorporated (Microsoft Corporation)
Inventors
Zlokarnik, Igor, Gillick, Laurence S., Cohen, Jordan R.

Application Number

US10/268,612
Publication Number

US 20040073428A1
Time in Patent Office

Days
Field of Search
US Class Current

704/260
CPC Class Codes

G10L 13/033   Voice editing, e.g. manipul...

G10L 13/04   Details of speech synthesis...

H04M 1/271   controlled by voice recogni...

Apparatus, methods, and programming for speech synthesis via bit manipulations of compressed database

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

13 Claims

Specification

Solutions

Use Cases

Quick Links

Apparatus, methods, and programming for speech synthesis via bit manipulations of compressed database

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

13 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links