System for automatically morphing audio information

US 5,749,073 A
Filed: 03/15/1996
Issued: 05/05/1998
Est. Priority Date: 03/15/1996
Status: Expired due to Term

First Claim

Patent Images

1. A method for morphing from a first sound to a second sound, comprising the steps of:

analyzing each of said first and second sounds to obtain a dense representation for each sound;

determining correspondence between the respective representations of said sounds;

modifying the representations of said sounds, based on said correspondence, to form a new representation; and

inverting the new representation and generating a morphed sound from the inverted representation.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In the first step of a sound morphing process, each sound which forms the basis for the morph is converted into one or more quantitative representations, such as spectrograms. After the representations have been obtained, the temporal axes of the two sounds are matched, so that similar components of the two sounds, such as onsets, harmonic regions and inharmonic regions, are aligned with one another. Other characteristics of the sounds, such as pitch, formant frequencies, or the like, are then matched. Once the energy in each of the sounds has been accounted for and matched to that of the other sound, the two sounds are cross-faded, to produce a representation of a new sound. This representation is then inverted, to generate the morphed sound.

119 Citations

47 Claims

1. A method for morphing from a first sound to a second sound, comprising the steps of:
- analyzing each of said first and second sounds to obtain a dense representation for each sound;
  
  determining correspondence between the respective representations of said sounds;
  
  modifying the representations of said sounds, based on said correspondence, to form a new representation; and
  
  inverting the new representation and generating a morphed sound from the inverted representation.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The method of claim 1 wherein the dense representation is a time-frequency display.
  - 3. The method of claim 2 wherein the time-frequency display is a spectrogram.
  - 4. The method of claim 1 wherein the determination of correspondence includes the step of dynamically time warping the representations to match them to one another.
  - 5. The method of claim 1 wherein said modification includes the step of interpolating between the representations of the two sounds.
  - 6. The method of claim 5 wherein said modification includes the further step of warping the representations of the sounds.
  - 7. The method of claim 1 wherein said representation includes information regarding the pitch of the sound, and the determination of correspondence includes the step of matching the pitch of the two sounds.
  - 8. The method of claim 7 wherein the representation contains pitch information independent of whether the sound is voiced.
  - 9. The method of claim 1 wherein said analyzing step includes the step of factoring each of said two sounds into a plurality of representations which respectively relate to different acoustic features of the sounds.
  - 10. The method of claim 9 wherein one of said representations contains information regarding the pitch and voicing of the sound.
  - 11. The method of claim 10 wherein another one of said representations contains information regarding the broad spectral characteristics of the sound.
  - 12. The method of claim 1 further including the steps of generating another representation of each sound that provides a distance metric of the temporal correspondence between the two sounds, and temporally matching the two sounds to one another.
  - 13. The method of claim 12 wherein said other representation comprises an MFCC analysis of each sound.

14. A method for morphing from a first sound to a second sound, comprising the steps of:
- factoring each of said two sounds into a plurality of representations which respectively relate to different acoustic features of the sounds;
  
  independently modifying said plural representations to produce a plurality of new representations;
  
  combining said new representations to produce a representation for a morphed sound; and
  
  inverting the representation and generating the morphed sound from the inverted representation.
- View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23)
- - 15. The method of claim 14 wherein one of said representations contains information regarding pitch and voicing aspects of the signal.
  - 16. The method of claim 15 wherein said one representation comprises a pitch spectrogram.
  - 17. The method of claim 15 wherein said one representation comprises a continuous estimate of pitch throughout the sound.
  - 18. The method of claim 14 wherein one of said representations contains information regarding the broad spectral characteristics of the sound.
  - 19. The method of claim 18 wherein said one representation comprises a spectrogram of the formant frequencies in a sound.
  - 20. The method of claim 14 wherein said modifying step includes the step of interpolating corresponding values for a representation of each of the two sounds.
  - 21. The method of claim 14 wherein said plural representations are independent of one another.
  - 22. The method of claim 14 wherein said representations are dense.
  - 23. The method of claim 14 further including the steps of generating a third representation of each sound that provides a distance metric of the temporal correspondence between the two sounds, and temporally matching the two sounds to one another.

24. A method for morphing from a first sound to a second sound, comprising the steps of:
- analyzing each of said first and second sounds to obtain at least one representation of each sound;
  
  automatically matching common regions of said representations so that they are temporally aligned with one another;
  
  modifying predetermined portions of corresponding temporally aligned features of said first and second sounds; and
  
  inverting the modified sound representation and generating a sound having acoustic characteristics between those of said first and second sounds.
- View Dependent Claims (25, 26, 27, 28, 29, 30, 31, 32, 33, 34)
- - 25. The method of claim 24 wherein said temporal matching comprises the step of obtaining MFCC representations of the sounds, and matching corresponding portions of the MFCC representations.
  - 26. The method of claim 24 further including the step of determining correspondence between at least one acoustic feature in the representation of said first and second sounds.
  - 27. The method of claim 26 wherein the matching of corresponding portions is carried out through dynamic time warping techniques.
  - 28. The method of claim 24 wherein the representation comprises a dense spectral analysis of each sound.
  - 29. The method of claim 28 wherein said dense spectral analysis comprises a pitch spectrogram which provides a distance metric for pitch information in a sound.
  - 30. The method of claim 28 wherein said dense spectral analysis comprises a smooth spectrogram which provides a distance metric for formant frequencies in a sound.
  - 31. The method of claim 24 wherein said analyzing step comprises factoring each of said two sounds into a plurality of representations which respectively relate to different acoustic features of the sounds.
  - 32. The method of claim 31 wherein said plurality of representations include a pitch spectrogram and a smooth spectrogram for each sound.
  - 33. The method of claim 31 wherein each of said plurality of representations is separately warped and interpolated, and then combined to form said modified sound representation.
  - 34. The method of claim 24 wherein said modification comprises warping and interpolating the representations of the sounds to form said modified sound representation.

35. A method for generating a sound based upon a dense spectral representation of a sound, comprising the steps of:
- generating a first spectrogram of a sound;
  
  determining the mel-frequency cepstral coefficients for the sound from said first spectrogram;
  
  inverting the mel-frequency cepstral coefficients to obtain a spectrogram of the formants of the sound; and
  
  subsequently generating a sound which is based upon information contained in the formant spectrogram.
- View Dependent Claims (36)
- - 36. The method of claim 35 further including the step of dividing said first spectrogram by said formant spectrogram to obtain a pitch or residual spectrogram, and generating said sound on the basis of information contained in the pitch spectrogram.

37. A method for producing a morph comprising a transition from one spoken word to another spoken word, comprising the steps of:
- generating a dense spectral representation of each spoken word;
  
  generating a plurality of modified representations, each of which comprises a different respective interpolation of corresponding values in the representation of said two sounds; and
  
  sequentially inverting each of said modified representation and generating a series of discrete sounds which transition from one of said spoken words to the other of said spoken words, and which include characteristics of each of said spoken words.

38. A method for transforming from a one-dimensional signal representing a physical phenomenon to a second one-dimensional signal representing another physical phenomenon, comprising the steps of:
- automatically defining points of correspondence between the respective signals;
  
  determining a desired point in a morphed signal, and selecting a pair of corresponding points in the original signals that are related to the determined point; and
  
  warping and interpolating the original signals, based on said pair of corresponding points, to form a morphed signal, and generating a sensory perceptible physical phenomenon corresponding to said morphed signal.
- View Dependent Claims (39, 40, 41, 42, 43, 44, 45)
- - 39. The method of claim 38 wherein said defining step includes the use of dynamic time warping to match the two original signals.
  - 40. The method of claim 38 further including the step of cross-fading the warped and interpolated signals.
  - 41. The method of claim 38 wherein each of said original signals is comprised of multiple waveforms, and wherein plural waveforms of each original signal are separately warped and interpolated.
  - 42. The method of claim 41 further including the step of combining the separately warped and interpolated waveforms to form the morphed signal.
  - 43. The method of claim 38 wherein said points constitute a dense correspondence between the signals.
  - 44. The method of claim 38 wherein said morphed signal is defined at a dense set of points.
  - 45. The method of claim 38 wherein said physical phenomena are audible sounds.

46. A method for generating an output sound which includes characteristic features of each of two input sounds, comprising the steps of:
- factoring each of said two input sounds into representations which include at least a pitch spectrogram for a first one of said two input sounds and at least a formant spectrogram for a second one of said two input sounds;
  
  combining information obtained from said pitch spectrogram for said first input sound with information obtained from said formant spectrogram for said second input sound to form a new representation for a morphed sound; and
  
  inverting said new representation and generating an output sound.

47. A method for generating a morphed sound from first and second input sounds, comprising the steps of:
- factoring each of said two input sounds into a plurality of representations which respectively relate to different acoustic features of the sounds;
  
  combining information obtained from a representation of the first input sound which relates to a first acoustic feature with information obtained from a representation of the second input sound that relates to a second, different acoustic feature, to produce a representation for a morphed sound; and
  
  inverting the representation for the morphed sound and generating the morphed sound from the inverted representation.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Vulcan Patents LLC
Original Assignee
Interval Research Corporation
Inventors
Slaney, Malcolm
Primary Examiner(s)
MacDonald, Allen R.
Assistant Examiner(s)
COLLINS, ALPHONSO

Application Number

US08/616,290
Time in Patent Office

781 Days
Field of Search

395/2.12, 395/2.15, 395/2.18, 395/2.74, 395/2.77-2.79, 395/2.5, 395/2.87
US Class Current

704/278
CPC Class Codes

G10H 2250/035   Crossfade, i.e. time domain...

G10H 2250/481   Formant synthesis, i.e. sim...

G10H 7/008   Means for controlling the t...

G10K 15/00   Acoustics not otherwise pro...

G10L 13/033   Voice editing, e.g. manipul...

System for automatically morphing audio information

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

119 Citations

47 Claims

Specification

Solutions

Use Cases

Quick Links

System for automatically morphing audio information

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

119 Citations

47 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links