Computerized speech synthesizer for synthesizing speech from text
First Claim
1. A computerized speech synthesizer for synthesizing prosodic speech from text, the speech synthesizer comprising non-transitory computer-readable storage media, the computer-readable storage media storing software and data that when executed by a computer implements:
- a) a text parser to parse text to be synthesized for syntax and meaning, and to identify text elements individually expressible with acoustic phonemes;
b) a prosodic parser to associate prosodic tags with the text elements identified, the prosodic tags indicating pronunciations for the respective text elements to provide desired prosodic characteristics in the output speech;
c) a phoneme database comprising a basic phoneme set, the basic phoneme set including at least about 80 acoustic phonemes useful to express the text elements, each acoustic phoneme having a respective waveform;
d) graphemes to represent the text elements, the graphemes comprising text characters, or symbols representing text characters, wherein each grapheme can be matched with an acoustic phoneme equivalent of the grapheme; and
e) a speech synthesis unit to select, sequence, and assemble acoustic phonemes from the phoneme database, the acoustic phonemes being selected to correspond with respective ones of the text elements and their associated prosodic tags, and to generate a prosodic speech signal from the assembled acoustic phonemes as a wave signal;
wherein assembly of the acoustic phonemes includes pitch synchronously connecting one selected acoustic phoneme to the next selected acoustic phoneme, the next selected acoustic phoneme having a significantly different pitch from the pitch of the one selected acoustic phoneme, by generating and interposing one or more artificial waveforms between the one selected acoustic phoneme and the next selected acoustic phoneme to transition the prosodic speech signal from the pitch of the one selected acoustic phoneme to the pitch of the next selected acoustic phoneme.
1 Assignment
0 Petitions
Accused Products
Abstract
Disclosed are novel embodiments of a speech synthesizer and speech synthesis method for generating human-like speech wherein a speech signal can be generated by concatenation from phonemes stored in a phoneme database. Wavelet transforms and interpolation between frames can be employed to effect smooth morphological fusion of adjacent phonemes in the output signal. The phonemes may have one prosody or set of prosody characteristics and one or more alternative prosodies may be created by applying prosody modification parameters to the phonemes from a differential prosody database. Preferred embodiments can provide fast, resource-efficient speech synthesis with an appealing musical or rhythmic output in a desired prosody style such as reportorial or human interest. The invention includes computer-determining a suitable prosody to apply to a portion of the text by reference to the determined semantic meaning of another portion of the text and applying the detennined prosody to the text by modification of the digitized phonemes. In this manner, prosodization can effectively be automated.
-
Citations
20 Claims
-
1. A computerized speech synthesizer for synthesizing prosodic speech from text, the speech synthesizer comprising non-transitory computer-readable storage media, the computer-readable storage media storing software and data that when executed by a computer implements:
-
a) a text parser to parse text to be synthesized for syntax and meaning, and to identify text elements individually expressible with acoustic phonemes; b) a prosodic parser to associate prosodic tags with the text elements identified, the prosodic tags indicating pronunciations for the respective text elements to provide desired prosodic characteristics in the output speech; c) a phoneme database comprising a basic phoneme set, the basic phoneme set including at least about 80 acoustic phonemes useful to express the text elements, each acoustic phoneme having a respective waveform; d) graphemes to represent the text elements, the graphemes comprising text characters, or symbols representing text characters, wherein each grapheme can be matched with an acoustic phoneme equivalent of the grapheme; and e) a speech synthesis unit to select, sequence, and assemble acoustic phonemes from the phoneme database, the acoustic phonemes being selected to correspond with respective ones of the text elements and their associated prosodic tags, and to generate a prosodic speech signal from the assembled acoustic phonemes as a wave signal; wherein assembly of the acoustic phonemes includes pitch synchronously connecting one selected acoustic phoneme to the next selected acoustic phoneme, the next selected acoustic phoneme having a significantly different pitch from the pitch of the one selected acoustic phoneme, by generating and interposing one or more artificial waveforms between the one selected acoustic phoneme and the next selected acoustic phoneme to transition the prosodic speech signal from the pitch of the one selected acoustic phoneme to the pitch of the next selected acoustic phoneme. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification