Waveform blending technique for text-to-speech system

US 5,490,234 A
Filed: 01/21/1993
Issued: 02/06/1996
Est. Priority Date: 01/21/1993
Status: Expired due to Term

First Claim

Patent Images

1. An apparatus for concatenating a first digital frame of N samples having respective magnitudes representing a first quasi-periodic waveform and a second digital frame of M samples having respective magnitudes representing a second quasi-periodic waveform, comprising:

a buffer store to store the samples of first and second digital frames;

means, coupled to the buffer store, for determining a blend point for the first and second digital frames in response to magnitudes of samples in the first and second digital frames;

blending means, coupled with the buffer store and the means for determining, for computing a digital sequence representing a concatenation of the first and second quasi-periodic waveforms in response to the first frame, the second frame and the blend point.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A concatenator for a first digital frame with a second digital frame, such as the ending and beginning of adjacent diphone strings being concatenated to form speech is based on determining an optimum blend point for the first and second digital frames in response to the magnitudes of samples in the first and second digital frames. The frames are then blended to generate a digital sequence representing a concatenation of the first and second frames with reference to the optimum blend point. The system operates by first computing an extended frame in response to the first digital frame, and then finding a subset of the extended frame with matches the second digital frame using a minimum average magnitude difference function over the samples in the subset. The blend point is the first sample of the matching subset. To generate the concatenated waveform, the subset of the extended frame is combined with the second digital frame and concatenated with the beginning segments of the extended frame to produce the concatenate waveform.

212 Citations

26 Claims

1. An apparatus for concatenating a first digital frame of N samples having respective magnitudes representing a first quasi-periodic waveform and a second digital frame of M samples having respective magnitudes representing a second quasi-periodic waveform, comprising:
- a buffer store to store the samples of first and second digital frames;
  
  means, coupled to the buffer store, for determining a blend point for the first and second digital frames in response to magnitudes of samples in the first and second digital frames;
  
  blending means, coupled with the buffer store and the means for determining, for computing a digital sequence representing a concatenation of the first and second quasi-periodic waveforms in response to the first frame, the second frame and the blend point.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The apparatus of claim 1, further including:
    - transducer means, coupled to the means for computing, for transducing the digital sequence to an analog waveform.
  - 3. The apparatus of claim 1, wherein the means for determining includes:
    - first means for computing an extended frame in response to the first digital frame;
      
      second means for finding a subset of the extended frame which provides an optimum match to the second digital frame, and defining the blend point as a sample in the subset.
  - 4. The apparatus of claim 3, wherein the extended frame comprises a concatenation of the first digital frame with a replica of the first digital frame.
  - 5. The apparatus of claim 3, wherein the subset of the extended frame which matches the second digital frame relatively well comprises a subset with a minimum average magnitude difference over the samples in the subset, and the blend point comprises a first sample in the subset.
  - 6. The apparatus of claim 1, wherein the means for determining includes:
    - first means for computing an extended frame comprising a discontinuity-smoothed concatenation of the first digital frame with a replica of the first digital frame;
      
      second means for finding a subset of the extended frame with a minimum average magnitude difference between the samples in the subset and the second digital frame, and defining the blend point as a first sample in the subset.
  - 7. The apparatus of claim 1, wherein the blending means includes:
    - means for supplying a first set of samples derived from the first digital frame and the blend point as a first segment of the digital sequence; and
      
      means for combining the second digital frame with a second set of samples derived from the first digital frame and the blend point, with emphasis on the second set in a starting sample and emphasis on the second digital frame in an ending sample to produce a second segment of the digital sequence.
  - 8. The apparatus of claim 1, wherein the means for determining includes:
    - first means for computing an extended frame comprising a discontinuity-smoothed concatenation of the first digital frame with a replica of the first digital frame;
      
      second means for finding a subset of the extended frame with a minimum average magnitude difference between the samples in the subset and the second digital frame, and defining the blend point as a first sample in the subset; and
      
      wherein the blending means includes;
      
      means for supplying a first set of samples derived from the first digital frame and the blend point as a first segment of the digital sequence; and
      
      means for combining the second digital frame with the subset of the extended frame, with emphasis on the subset of the extended frame in a starting sample and emphasis on the second digital frame in an ending sample to produce a second segment of the digital sequence.
  - 9. The apparatus of claim 8, wherein the first and second digital frames represent endings and beginnings respectively of adjacent diphones in speech, and further including:
    - transducer means, coupled to the blending means, for transducing the digital sequence to a sound corresponding to the speech.

10. An apparatus for concatenating a first digital frame of N samples having respective magnitudes representing a first sound segment and a second digital frame of M samples having respective magnitudes representing a second sound segment, comprising:
- a buffer store to store the samples of first and second digital frames;
  
  means, coupled to the buffer store, for determining a blend point for the first and second digital frames in response to magnitudes of samples in the first and second digital frames;
  
  blending means, coupled with the buffer store and the means for determining, for computing a digital sequence representing a concatenation of the first and second sound segments in response to the first digital frame, the second digital frame and the blend point; and
  
  transducer means, coupled to the blending means, for transducing the digital sequence to sound.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
- - 11. The apparatus of claim 10, wherein the means for determining includes:
    - first means for computing an extended frame in response to the first digital frame;
      
      second means for finding a subset of the extended frame which provides an optimum match to the second digital frame, and defining the blend point as a sample in the subset.
  - 12. The apparatus of claim 11, wherein the extended frame comprises a concatenation of the first digital frame with a replica of the first digital frame.
  - 13. The apparatus of claim 11, wherein the subset of the extended frame which matches the second digital frame relatively well comprises a subset with a minimum average magnitude difference over the samples in the subset, and the blend point comprises a first sample in the subset.
  - 14. The apparatus of claim 10, wherein the means for determining includes:
    - first means for computing an extended frame comprising a discontinuity-smoothed concatenation of the first digital frame with a replica of the first digital frame;
      
      second means for finding a subset of the extended frame with a minimum average magnitude difference between the samples in the subset and the second digital frame, and defining the blend point as a first sample in the subset.
  - 15. The apparatus of claim 10, wherein the blending means includes:
    - means for supplying a first set of samples derived from the first digital frame and the blend point as a first segment of the digital sequence; and
      
      means for combining the second digital frame with a second set of samples derived from the first digital frame and the blend point, with emphasis on the second set in a starting sample and emphasis on the second digital frame in an ending sample to produce a second segment of the digital sequence.
  - 16. The apparatus of claim 10, wherein the means for determining includes:
    - first means for computing an extended frame comprising a discontinuity-smoothed concatenation of the first digital frame with a replica of the first digital frame;
      
      second means for finding a subset of the extended frame with a minimum average magnitude difference between the samples in the subset and the second digital frame, and defining the blend point as a first sample in the subset; and
      
      wherein the blending means includes;
      
      means for supplying a first set of samples derived from the first digital frame and the blend point as a first segment of the digital sequence; and
      
      means for combining the second digital frame with the subset of the extended frame, with emphasis on the subset of the extended frame in a starting sample and emphasis on the second digital frame in an ending sample to produce a second segment of the digital sequence.
  - 17. The apparatus of claim 16, wherein the first and second digital frames represent endings and beginnings respectively of adjacent diphones in speech, and the transducer means produces synthesized speech.

18. An apparatus for synthesizing speech in response to a text, comprising:
- means for translating text to a sequence of sound segment codes;
  
  means, responsive to sound segment codes in the sequence, for decoding the sequence of sound segment codes to produce strings of digital frames of a plurality of samples representing sounds for respective sound segment codes in the sequence, wherein the identified strings of digital frames have beginnings and endings;
  
  means for concatenating a first digital frame at the ending of an identified string of digital frames of a particular sound segment code in the sequence with a second digital frame at the beginning an identified string of digital frames of an adjacent sound segment code in the sequence to produce a speech data sequence, includinga buffer store to store the samples of first and second digital frames;
  
  means, coupled to the buffer store, for determining a blend point for the first and second digital frames in response to magnitudes of samples in the first and second digital frames; and
  
  blending means, coupled with the buffer store and the means for determining, for computing a digital sequence representing a concatenation of the first and second sound segments in response to the first frame, the second frame and the blend point; and
  
  an audio transducer, coupled to the means for concatenating, togenerate synthesized speech in response to the speech data sequence.
- View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26)
- - 19. The apparatus of claim 18, further including:
    - means, responsive to the sound segment codes for adjusting pitch and duration of the identified strings of digital frames in the speech data sequence.
  - 20. The apparatus of claim 18, wherein the means for determining includes:
    - first means for computing an extended frame in response to the first digital frame;
      
      second means for finding a subset of the extended frame which provides an optimum match to the second digital frame, and defining the blend point as a sample in the subset.
  - 21. The apparatus of claim 20, wherein the extended frame comprises a concatenation of the first digital frame with a replica of the first digital frame.
  - 22. The apparatus of claim 20, wherein the subset of the extended frame which matches the second digital frame relatively well comprises a subset with a minimum average magnitude difference over the samples in the subset, and the blend point comprises a first sample in the subset.
  - 23. The apparatus of claim 18, wherein the means for determining includes:
    - first means for computing an extended frame comprising a discontinuity-smoothed concatenation of the first digital frame with a replica of the first digital frame;
      
      second means for finding a subset of the extended frame with a minimum average magnitude difference between the samples in the subset and the second digital frame, and defining the blend point as a first sample in the subset.
  - 24. The apparatus of claim 18, wherein the blending means includes:
    - means for supplying a first set of samples derived from the first digital frame and the blend point as a first segment of the digital sequence; and
      
      means for combining the second digital frame with a second set of samples derived from the first digital frame and the blend point, with emphasis on the second set in a starting sample and emphasis on the second digital frame in an ending sample to produce a second segment of the digital sequence.
  - 25. The apparatus of claim 18, wherein the means for determining includes:
    - first means for computing an extended frame comprising a discontinuity-smoothed concatenation of the first digital frame with a replica of the first digital frame;
      
      second means for finding a subset of the extended frame with a minimum average magnitude difference between the samples in the subset and the second digital frame, and defining the blend point as a first sample in the subset; and
      
      wherein the blending means includes;
      
      means for supplying a first set of samples derived from the first digital frame and the blend point as a first segment of the digital sequence; and
      
      means for combining the second digital frame with the subset of the extended frame, with emphasis on the subset of the extended frame in a starting sample and emphasis on the second digital frame in an ending sample to produce a second segment of the digital sequence.
  - 26. The apparatus of claim 18, wherein the sound segment codes represent speech diphones, and the first and second digital frames represent endings and beginnings respectively of adjacent diphones in speech.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Apple Inc.
Original Assignee
Apple Computer Incorporated (Apple Inc.)
Inventors
Narayan, Shankar
Primary Examiner(s)
MacDonald, Allen R.
Assistant Examiner(s)
ONKA, THOMAS

Application Number

US08/007,621
Time in Patent Office

1,111 Days
Field of Search

381/52, 395/2.69, 395/2.74, 395/2.77
US Class Current

704/260
CPC Class Codes

G10L 13/07 Concatenation rules

Waveform blending technique for text-to-speech system

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

212 Citations

26 Claims

Specification

Solutions

Use Cases

Quick Links

Waveform blending technique for text-to-speech system

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

212 Citations

26 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links