Waveform blending technique for text-to-speech system
First Claim
1. An apparatus for concatenating a first digital frame of N samples having respective magnitudes representing a first quasi-periodic waveform and a second digital frame of M samples having respective magnitudes representing a second quasi-periodic waveform, comprising:
- a buffer store to store the samples of first and second digital frames;
means, coupled to the buffer store, for determining a blend point for the first and second digital frames in response to magnitudes of samples in the first and second digital frames;
blending means, coupled with the buffer store and the means for determining, for computing a digital sequence representing a concatenation of the first and second quasi-periodic waveforms in response to the first frame, the second frame and the blend point.
2 Assignments
0 Petitions
Accused Products
Abstract
A concatenator for a first digital frame with a second digital frame, such as the ending and beginning of adjacent diphone strings being concatenated to form speech is based on determining an optimum blend point for the first and second digital frames in response to the magnitudes of samples in the first and second digital frames. The frames are then blended to generate a digital sequence representing a concatenation of the first and second frames with reference to the optimum blend point. The system operates by first computing an extended frame in response to the first digital frame, and then finding a subset of the extended frame with matches the second digital frame using a minimum average magnitude difference function over the samples in the subset. The blend point is the first sample of the matching subset. To generate the concatenated waveform, the subset of the extended frame is combined with the second digital frame and concatenated with the beginning segments of the extended frame to produce the concatenate waveform.
212 Citations
26 Claims
-
1. An apparatus for concatenating a first digital frame of N samples having respective magnitudes representing a first quasi-periodic waveform and a second digital frame of M samples having respective magnitudes representing a second quasi-periodic waveform, comprising:
-
a buffer store to store the samples of first and second digital frames; means, coupled to the buffer store, for determining a blend point for the first and second digital frames in response to magnitudes of samples in the first and second digital frames; blending means, coupled with the buffer store and the means for determining, for computing a digital sequence representing a concatenation of the first and second quasi-periodic waveforms in response to the first frame, the second frame and the blend point. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. An apparatus for concatenating a first digital frame of N samples having respective magnitudes representing a first sound segment and a second digital frame of M samples having respective magnitudes representing a second sound segment, comprising:
-
a buffer store to store the samples of first and second digital frames; means, coupled to the buffer store, for determining a blend point for the first and second digital frames in response to magnitudes of samples in the first and second digital frames; blending means, coupled with the buffer store and the means for determining, for computing a digital sequence representing a concatenation of the first and second sound segments in response to the first digital frame, the second digital frame and the blend point; and transducer means, coupled to the blending means, for transducing the digital sequence to sound. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
-
-
18. An apparatus for synthesizing speech in response to a text, comprising:
-
means for translating text to a sequence of sound segment codes; means, responsive to sound segment codes in the sequence, for decoding the sequence of sound segment codes to produce strings of digital frames of a plurality of samples representing sounds for respective sound segment codes in the sequence, wherein the identified strings of digital frames have beginnings and endings; means for concatenating a first digital frame at the ending of an identified string of digital frames of a particular sound segment code in the sequence with a second digital frame at the beginning an identified string of digital frames of an adjacent sound segment code in the sequence to produce a speech data sequence, including a buffer store to store the samples of first and second digital frames; means, coupled to the buffer store, for determining a blend point for the first and second digital frames in response to magnitudes of samples in the first and second digital frames; and blending means, coupled with the buffer store and the means for determining, for computing a digital sequence representing a concatenation of the first and second sound segments in response to the first frame, the second frame and the blend point; and
an audio transducer, coupled to the means for concatenating, togenerate synthesized speech in response to the speech data sequence. - View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26)
-
Specification