Computerized speech synthesizer for synthesizing speech from text

US 8,219,398 B2
Filed: 03/28/2006
Issued: 07/10/2012
Est. Priority Date: 03/28/2005
Status: Active Grant

First Claim

Patent Images

1. A computerized speech synthesizer for synthesizing prosodic speech from text, the speech synthesizer comprising non-transitory computer-readable storage media, the computer-readable storage media storing software and data that when executed by a computer implements:

a) a text parser to parse text to be synthesized for syntax and meaning, and to identify text elements individually expressible with acoustic phonemes;

b) a prosodic parser to associate prosodic tags with the text elements identified, the prosodic tags indicating pronunciations for the respective text elements to provide desired prosodic characteristics in the output speech;

c) a phoneme database comprising a basic phoneme set, the basic phoneme set including at least about 80 acoustic phonemes useful to express the text elements, each acoustic phoneme having a respective waveform;

d) graphemes to represent the text elements, the graphemes comprising text characters, or symbols representing text characters, wherein each grapheme can be matched with an acoustic phoneme equivalent of the grapheme; and

e) a speech synthesis unit to select, sequence, and assemble acoustic phonemes from the phoneme database, the acoustic phonemes being selected to correspond with respective ones of the text elements and their associated prosodic tags, and to generate a prosodic speech signal from the assembled acoustic phonemes as a wave signal;

wherein assembly of the acoustic phonemes includes pitch synchronously connecting one selected acoustic phoneme to the next selected acoustic phoneme, the next selected acoustic phoneme having a significantly different pitch from the pitch of the one selected acoustic phoneme, by generating and interposing one or more artificial waveforms between the one selected acoustic phoneme and the next selected acoustic phoneme to transition the prosodic speech signal from the pitch of the one selected acoustic phoneme to the pitch of the next selected acoustic phoneme.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Disclosed are novel embodiments of a speech synthesizer and speech synthesis method for generating human-like speech wherein a speech signal can be generated by concatenation from phonemes stored in a phoneme database. Wavelet transforms and interpolation between frames can be employed to effect smooth morphological fusion of adjacent phonemes in the output signal. The phonemes may have one prosody or set of prosody characteristics and one or more alternative prosodies may be created by applying prosody modification parameters to the phonemes from a differential prosody database. Preferred embodiments can provide fast, resource-efficient speech synthesis with an appealing musical or rhythmic output in a desired prosody style such as reportorial or human interest. The invention includes computer-determining a suitable prosody to apply to a portion of the text by reference to the determined semantic meaning of another portion of the text and applying the detennined prosody to the text by modification of the digitized phonemes. In this manner, prosodization can effectively be automated.

Citations

20 Claims

1. A computerized speech synthesizer for synthesizing prosodic speech from text, the speech synthesizer comprising non-transitory computer-readable storage media, the computer-readable storage media storing software and data that when executed by a computer implements:
- a) a text parser to parse text to be synthesized for syntax and meaning, and to identify text elements individually expressible with acoustic phonemes;
  
  b) a prosodic parser to associate prosodic tags with the text elements identified, the prosodic tags indicating pronunciations for the respective text elements to provide desired prosodic characteristics in the output speech;
  
  c) a phoneme database comprising a basic phoneme set, the basic phoneme set including at least about 80 acoustic phonemes useful to express the text elements, each acoustic phoneme having a respective waveform;
  
  d) graphemes to represent the text elements, the graphemes comprising text characters, or symbols representing text characters, wherein each grapheme can be matched with an acoustic phoneme equivalent of the grapheme; and
  
  e) a speech synthesis unit to select, sequence, and assemble acoustic phonemes from the phoneme database, the acoustic phonemes being selected to correspond with respective ones of the text elements and their associated prosodic tags, and to generate a prosodic speech signal from the assembled acoustic phonemes as a wave signal;
  
  wherein assembly of the acoustic phonemes includes pitch synchronously connecting one selected acoustic phoneme to the next selected acoustic phoneme, the next selected acoustic phoneme having a significantly different pitch from the pitch of the one selected acoustic phoneme, by generating and interposing one or more artificial waveforms between the one selected acoustic phoneme and the next selected acoustic phoneme to transition the prosodic speech signal from the pitch of the one selected acoustic phoneme to the pitch of the next selected acoustic phoneme.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 2. A computerized speech synthesizer according to claim 1 wherein the prosodic tags are associated one with each grapheme and specify desired acoustic values for acoustic phonemes to be selected to express the text elements according to articulatory rules for the text elements.
  - 3. A computerized speech synthesizer according to claim 2 wherein the prosodic tags indicate desired values for pitch, duration and amplitude of each acoustic phoneme.
  - 4. A computerized speech synthesizer according to claim 1, wherein the speech synthesizer comprises acoustic files for producing pronunciations of the parsed text representing audibly different speakers in the text.
  - 5. A computerized speech synthesizer to according to claim 4 wherein the text comprises text appropriate for multiple speakers and the text parser outputs multiple speaker rules that produce natural sounding pronunciations appropriate to the semantic meaning of the parsed text and to the particular persons speaking the parsed text.
  - 6. A computerized speech synthesizer according to claim 1, wherein the text elements can each be selectively expressed by multiple prosodic values to represent the text elements in the prosodic speech signal with a desired one of multiple prosody styles.
  - 7. A computerized speech synthesizer according to claim 6 comprising a differential phoneme database, the differential phoneme database comprising multiple phonetic modification parameters to change the prosody of individual acoustic phonemes in the phoneme database and enable the prosodic speech signal to be audibilized with different prosody styles.
  - 8. A computerized speech synthesizer according to claim 7 wherein the phonetic modification parameters are derived from acoustical recordings of a trained speaker.
  - 9. A computerized speech synthesizer according to claim 1, wherein the interposed one or more artificial wave-forms each have a pitch and an amplitude intermediate between the pitch and amplitude of the one selected acoustic phoneme the pitch and amplitude of the next selected acoustic phoneme.
  - 10. A computerized speech synthesizer according to claim 1, wherein each acoustic phoneme in the basic phoneme set is stored as a wavelet transformation.
  - 11. A computerized speech synthesizer according to claim 1, wherein the number of acoustic phonemes in the phoneme database is from about 100 to about 400.
  - 12. A computerized speech synthesizer according to claim 1, wherein the computerized speech synthesizer comprises acoustic phonemes for producing pronunciations of the parsed text representing different prosody styles.
  - 13. A speech synthesizer according to claim 1, wherein the basic phoneme set has a basic prosody style and the computerized speech synthesizer comprises one or more differential prosody models for application to the basic phoneme set to provide an alternative prosody style in the prosodic speech signal.
  - 14. A computerized speech synthesizer according to claim 1 wherein interpolation of the one or more artificial waveforms is effected by employing an algorithm utilizing fractal mathematics.
  - 15. A computerized speech synthesizer according to claim 1 wherein the speech synthesizer comprises a wave generator to generate the prosodic speech signal from input text, an ambiguity-and-lexical stress module, and a prosodic text analysis component to specify rhythm, intonation and style.
  - 16. A computerized speech synthesizer according to claim 1, wherein the computerized speech synthesizer further comprises a music transform module to transform the prosodic speech signal to a musical output signal.
  - 17. A computerized speech synthesizer according to claim 1, wherein the text parser can effect a text normalization step wherein text to be synthesized is normalized, a part-of-speech tagging step, a syntactic analysis step, a meaning assignment step, and a prosodic context identification step, to generate prosodically parsed text.
  - 18. A computerized speech synthesizer according to claim 1, wherein the text parser can assign prosodic markings by prosodically parsing each text sentence into an array, assigning pronunciation rules to the letters comprising the words in the text sentence, examining the letter sequences across word boundaries to identify pronunciation rules modification, identifying the part-of-speech of each word in the text sentence, assigning an intonation pattern, creating a prosodically marked up text, and outputting the prosodically marked up text to create a grapheme-to-phoneme matrix.
  - 19. An on-demand audio publishing system comprising a computerized speech synthesizer according to claim 1.
  - 20. An on-demand audio publishing system comprising a computerized speech synthesizer according to claim 3 configured to produce speech accessible over a client-server network, the Internet, or a handheld device.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Lessac Technologies, Inc.
Original Assignee
Lessac Technologies, Inc.
Inventors
Marple, Gary, Chandra, Nishant
Primary Examiner(s)
Godbold, Douglas

Application Number

US11/909,514
Publication Number

US 20080195391A1
Time in Patent Office

2,296 Days
Field of Search

704/258, 704/260
US Class Current

704/260
CPC Class Codes

G10L 13/06 Elementary speech units use...

G10L 13/10 Prosody rules derived from ...

Computerized speech synthesizer for synthesizing speech from text

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Computerized speech synthesizer for synthesizing speech from text

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links