Fast waveform synchronization for concentration and time-scale modification of speech

US 7,058,569 B2
Filed: 09/14/2001
Issued: 06/06/2006
Est. Priority Date: 09/15/2000
Status: Expired due to Term

First Claim

Patent Images

1. A digital waveform concatenation system for use in an acoustic processing application, the system comprising:

a digital waveform provider that produces an input sequence of at least two digital waveform segments, each waveform segment being a sequence of samples; and

a waveform concatenator that;

i. synchronizes input waveform segments to form a sequence of partially overlapping waveform segments, andii. weights and adds selected portions of the overlapping waveform segments to concatenate the input waveform segments so as to produce a single digital waveform;

wherein for segments of voiced speech, the synchronizing includes aligning a minimum energy anchor in each waveform segment with a corresponding minimum energy anchor of an adjacent waveform segment, each minimum energy anchor location in a given segment being optimized based on determining minimum weighted energy in a neighborhood of a boundary of the given segment.

View all claims

10 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A synthesis method for concatenative speech synthesis is provided for efficiently concatenating waveform segments in the time-domain. A digital waveform provider produces an input sequence of digital waveform segments. A waveform concatenator concatenates the input segments by using waveform blending within a concatenation zone to synchronize, weight, and overlap-add selected portions of the input segments to produce a single digital waveform. The synchronizing includes determining a minimum weighted energy anchor in the selected portion of each input segment and aligning synchronization peaks in a local vicinity of each anchor.

252 Citations

50 Claims

1. A digital waveform concatenation system for use in an acoustic processing application, the system comprising:
- a digital waveform provider that produces an input sequence of at least two digital waveform segments, each waveform segment being a sequence of samples; and
  
  a waveform concatenator that;
  
  i. synchronizes input waveform segments to form a sequence of partially overlapping waveform segments, andii. weights and adds selected portions of the overlapping waveform segments to concatenate the input waveform segments so as to produce a single digital waveform;
  
  wherein for segments of voiced speech, the synchronizing includes aligning a minimum energy anchor in each waveform segment with a corresponding minimum energy anchor of an adjacent waveform segment, each minimum energy anchor location in a given segment being optimized based on determining minimum weighted energy in a neighborhood of a boundary of the given segment.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
- - 2. A concatenation system according to claim 1, wherein the acoustic processing application includes a text-to-speech application.
  - 3. A concatenation system according to claim 1, wherein the acoustic processing application includes a speech broadcast application.
  - 4. A concatenation system according to claim 1, wherein the acoustic processing application includes a carrier-slot application.
  - 5. A concatenation system according to claim 1, wherein the acoustic processing application includes a time-scale modification system.
  - 6. A concatenation system according to claim 1, wherein the waveform segments include at least one of speech diphones and speech triphones.
  - 7. A concatenation system according to claim 1, wherein the waveform segments include at least one of speech phones and speech demi-phones.
  - 8. A concatenation system according to claim 1, wherein the waveform segments include at least one of speech demi-syllables, speech syllables, words, and phrases.
  - 9. A concatenation system according to claim 1, wherein determining minimum weighted energy in the selected portion includes using a sliding weighted energy calculation algorithm.
  - 10. A concatenation system according to claim 1, wherein the input segments are filtered before synchronizing.
  - 11. A concatenation system according to claim 1, wherein aligning minimum energy anchors includes determining a largest waveform peak or trough in the close neighborhood of each minimum energy anchor.
  - 12. A concatenation system according to claim 11, wherein the close neighborhood is an interval of at least one pitch period containing the minimum energy anchor.
  - 13. A concatenation system according to claim 11, wherein the close neighborhood is the selected portion of the input segment.
  - 14. A concatenation system according to claim 11, wherein the location of one minimum energy anchor is the lowest weighted energy location in the selected portion.
  - 15. A concatenation system according to claim 14, wherein another minimum energy anchor location is chosen such that the previously determined waveform peak or trough in each selected portion coincide when the input segments are overlap-added.

16. A digital waveform concatenation system for use in an acoustic processing application, the system comprising:
- a digital waveform provider that produces an input sequence of at least two digital waveform segments, each waveform segment being a sequence of samples; and
  
  a waveform concatenator that;
  
  i. synchronizes successive waveform segments to form a sequence of partially overlapping waveform segments, the overlapping portion of each waveform segment including an optimization zone near a waveform segment boundary, andii. weights, and adds selected portions of the input segments to concatenate the input segments so as to produce a single digital waveform;
  
  wherein for segments of voiced speech, the synchronizing includes aligning a largest waveform peak or trough in the optimization zone of each input waveform segment with a corresponding largest waveform peak or trough in an optimization zone of an adjacent waveform segment.
- View Dependent Claims (17, 18, 19, 20, 21, 22, 23)
- - 17. A concatenation system according to claim 16, wherein the acoustic processing application includes a text-to-speech application.
  - 18. A concatenation system according to claim 16, wherein the acoustic processing application includes a speech broadcast application.
  - 19. A concatenation system according to claim 16, wherein the acoustic processing application includes a carrier-slot application.
  - 20. A concatenation system according to claim 16, wherein the waveform segments include at least one of speech diphones and speech triphones.
  - 21. A concatenation system according to claim 16, wherein the waveform segments include at least one of speech phones and speech demi-phones.
  - 22. A concatenation system according to claim 16, wherein the waveform segments include at least one of speech demi-syllables, speech syllables, words, and phrases.
  - 23. A concatenation system according to claim 16, wherein the input segments are filtered before aligning.

24. A digital waveform concatenation system for use in an acoustic processing application, the system comprising:
- a digital waveform provider that produces an input sequence of at least two digital waveform segments, each waveform segment being a sequence of samples; and
  
  a waveform concatenator that;
  
  i. synchronizes successive waveform segments to form a sequence of partially overlapping waveform segments, andii. weights and adds selected portions of the overlapping waveform segments to concatenate the input waveform segments so as to produce a single digital waveform;
  
  wherein for segments of voiced speech, the synchronizing includes aligning synchronization peaks or troughs in selected portion of each input waveform segment with synchronization peaks or troughs in a corresponding selected portion of an adjacent waveform segment, the location of the selected portions being determined by searching in a neighborhood of waveform segment boundaries for a location where the sum of the weighted energy of the selected portions is minimal.
- View Dependent Claims (25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37)
- - 25. A concatenation system according to claim 24, wherein the acoustic processing application includes a text-to-speech application.
  - 26. A concatenation system according to claim 24, wherein the acoustic processing application includes a speech broadcast application.
  - 27. A concatenation system according to claim 24, wherein the acoustic processing application includes a carrier-slot application.
  - 28. A concatenation system according to claim 24, wherein the acoustic processing application includes a time-scale modification system.
  - 29. A concatenation system according to claim 24, wherein the waveform segments include at least one of speech diphones and speech triphones.
  - 30. A concatenation system according to claim 24, wherein the waveform segments include at least one of speech phones and speech demi-phones.
  - 31. A concatenation system according to claim 24, wherein the waveform segments include at least one of speech demi-syllables, speech syllables, words, and phrases.
  - 32. A concatenation system according to claim 24, wherein determining a minimum weighted energy anchor includes using a sliding weighted energy calculation algorithm.
  - 33. A concatenation system according to claim 24, wherein the input segments are filtered before synchronizing.
  - 34. A concatenation system according to claim 24, wherein aligning synchronization peaks or troughs includes determining a largest waveform peak or trough in the close neighborhood of each anchor.
  - 35. A concatenation system according to claim 34, wherein the close neighborhood is an interval of at least one pitch period containing the minimum energy anchor.
  - 36. A concatenation system according to claim 34, wherein the close neighborhood is the selected portion of the input segment.
  - 37. A concatenation system according to claim 34, wherein the location of one anchor is chosen such that the synchronization peaks or troughs in each selected portion coincide when the input segments are overlap-added.

38. A digital waveform concatenation system for use in an acoustic processing application, the system comprising:
- a digital waveform provider that produces an input sequence of at least two digital waveform segments, each waveform segment being a sequence of samples; and
  
  a waveform concatenator that;
  
  i. synchronizes successive waveform segments to form a sequence of partially overlapping waveform segments, andii. weights, and adds selected portions of the overlapping waveform segments to concatenate the input waveform segments so as to produce a single digital waveform;
  
  wherein for pairs of overlapping segments of voiced speech, a first selected portion includes a minimum energy anchor in a location optimized based on determining minimum weighted energy in a neighborhood of the waveform segment boundaries, and a second selected portion is determined by aligning synchronization peaks or troughs in the neighborhood of the waveform segment boundaries.
- View Dependent Claims (39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50)
- - 39. A concatenation system according to claim 38, wherein the acoustic processing application includes a text-to-speech application.
  - 40. A concatenation system according to claim 38, wherein the acoustic processing application includes a speech broadcast application.
  - 41. A concatenation system according to claim 38, wherein the acoustic processing application includes a carrier-slot application.
  - 42. A concatenation system according to claim 38, wherein the acoustic processing application includes a time-scale modification system.
  - 43. A concatenation system according to claim 38, wherein the waveform segments include at least one of speech diphones and speech triphones.
  - 44. A concatenation system according to claim 38, wherein the waveform segments include at least one of speech phones and speech demi-phones.
  - 45. A concatenation system according to claim 38, wherein the waveform segments include at least one of speech demi-syllables, speech syllables, words, and phrases.
  - 46. A concatenation system according to claim 38, wherein determining a minimum weighted energy anchor includes using a sliding weighted energy calculation algorithm.
  - 47. A concatenation system according to claim 38, wherein the input segments are filtered before synchronizing.
  - 48. A concatenation system according to claim 38, wherein aligning synchronization peaks or troughs includes determining a largest waveform peak or trough in the close neighborhood of the anchor and determining a corresponding peak or trough in the selected portion of the other input segment.
  - 49. A concatenation system according to claim 48, wherein the close neighborhood is an interval of at least one pitch period containing the minimum weighted energy anchor.
  - 50. A concatenation system according to claim 48, wherein the close neighborhood is the selected portion of the input segment.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cerence Operating Company (Cerence Inc.)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Coorman, Geert, Coile, Bert Van
Primary Examiner(s)
Young, Wayne
Assistant Examiner(s)
JACKSON, JAKIEDA R

Application Number

US09/953,075
Publication Number

US 20020143526A1
Time in Patent Office

1,726 Days
Field of Search

704/216, 704/231, 704/265, 704/270, 704/208, 704/260, 704/269, 704/249, 704/264, 395/2.74, 395/2.76, 395/2.77, 395/2.09, 381/43
US Class Current

704/216
CPC Class Codes

G10L 13/07 Concatenation rules

G10L 21/04 Time compression or expansion

Fast waveform synchronization for concentration and time-scale modification of speech

First Claim

10 Assignments

0 Petitions

Accused Products

Abstract

252 Citations

50 Claims

Specification

Use Cases

Quick Links

Others

Fast waveform synchronization for concentration and time-scale modification of speech

First Claim

10 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

252 Citations

50 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others