VOICE-ESTIMATION BASED ON REAL-TIME PROBING OF THE VOCAL TRACT

US 20120136660A1
Filed: 11/30/2010
Published: 05/31/2012
Est. Priority Date: 11/30/2010
Status: Abandoned Application

First Claim

Patent Images

1. An apparatus, comprising:

a speaker for directing an excitation signal into a vocal tract;

a microphone for detecting a vocal-tract response signal corresponding to the excitation signal; and

a digital signal processor operatively coupled to the microphone and configured to;

process a segment of the response signal to determine a corresponding set of one or more formant frequencies for the vocal tract; and

further process the set of formant frequencies to identify a phoneme corresponding to the segment.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A voice-estimation device that probes the vocal tract of a user with sub-threshold acoustic waves to estimate the user'"'"'s voice while the user speaks silently or audibly in a noisy or socially sensitive environment. The waves reflected by the vocal tract are detected and converted into a digital signal, which is then processed segment-by-segment. Based on the processing, a set of formant frequencies is determined for each segment. Each such set is then analyzed to assign a phoneme to the corresponding segment of the digital signal. The resulting sequence of phonemes is converted into a digital audio signal or text representing the user'"'"'s estimated voice.

Citations

20 Claims

1. An apparatus, comprising:
- a speaker for directing an excitation signal into a vocal tract;
  
  a microphone for detecting a vocal-tract response signal corresponding to the excitation signal; and
  
  a digital signal processor operatively coupled to the microphone and configured to;
  
  process a segment of the response signal to determine a corresponding set of one or more formant frequencies for the vocal tract; and
  
  further process the set of formant frequencies to identify a phoneme corresponding to the segment.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
- - 2. The apparatus of claim 1, wherein the apparatus is configured to convert into a digital audio signal a sequence of phonemes that is identified by the processor based on a plurality of segments of the response signal.
  - 3. The apparatus of claim 1, wherein the apparatus is configured to convert into text a sequence of phonemes that is identified by the processor based on a plurality of segments of the response signal.
  - 4. The apparatus of claim 1, further comprising a random-number generator, wherein:
    - the excitation signal comprises a sequence of excitation pulses that corresponds to a sequence of random numbers generated by the random-number generator; and
      
      the processor uses said sequence of random numbers in the processing of the response signal.
  - 5. The apparatus of claim 4, further comprising a controller operatively coupled to the speaker to apply thereto a drive signal that causes the speaker to generate the excitation signal, wherein the controller comprises:
    - a pulse generator for converting the sequence of random numbers into a corresponding sequence of pulse-envelope shapes;
      
      a multiplier for injecting a carrier frequency into the pulse-envelope shapes; and
      
      a band-pass filter for filtering a signal produced by the multiplier as a result of said injection, wherein a filtered signal produced by the band-pass filter is the drive signal.
  - 6. The apparatus of claim 5, wherein:
    - the controller is operatively coupled to provide one or more parameters of the drive signal to the processor; and
      
      the processor uses said one or more parameters in the processing of the detected response signal.
  - 7. The apparatus of claim 6, wherein said one or more parameters comprise at least one of the carrier frequency, a pulse-envelope shape used by the pulse generator, and a spectral characteristic of the band-pass filter.
  - 8. The apparatus of claim 5, wherein the carrier frequency is greater than about 20 kHz.
  - 9. The apparatus of claim 5, wherein:
    - the carrier frequency is in a range between about 1 kHz and about 20 kHz; and
      
      the pulse-envelope shapes have amplitudes that cause the excitation signal to have an intensity that is below a human physiological-perception threshold.
  - 10. The apparatus of claim 4, wherein:
    - the processor correlates the segment of the response signal and a corresponding segment of the sequence of random numbers to determine a reflected impulse response of the vocal tract; and
      
      the processor determines the set of formant frequencies based on the reflected impulse response.
  - 11. The apparatus of claim 10, wherein:
    - the processor determines an impedance profile of the vocal tract based on the reflected impulse response; and
      
      the processor determines the set of formant frequencies based on the impedance profile.
  - 12. The apparatus of claim 11, wherein, for the determination of the impedance profile, the processor is configured to:
    - employ a model of the vocal tract according to which the vocal tract comprises a plurality of constant-impedance sections;
      
      decompose the reflected impulse response into components corresponding to wave reflections from impedance discontinuities between adjacent constant-impedance sections; and
      
      determine the impedance profile based on said decomposition.
  - 13. The apparatus of claim 1, wherein:
    - the set comprises M formant frequencies, where M is an integer greater than one; and
      
      for the identification of the phoneme corresponding to the segment, the processor is configured to map the M formant frequencies onto a phoneme constellation comprising a plurality of constellation points in an M-dimensional frequency space, wherein each phoneme is represented by at least one distinct constellation point.
  - 14. The apparatus of claim 13, wherein M is different for different types of phonemes.
  - 15. The apparatus of claim 1, wherein the response signal corresponds to silent speech.
  - 16. The apparatus of claim 1, wherein the speaker, the microphone, and the signal processor are implemented in a cell phone.

17. An apparatus, comprising a digital signal processor for being operatively coupled to a speaker configured to direct an excitation signal into a vocal tract and to a microphone configured to detect a vocal-tract response signal corresponding to the excitation signal, wherein said processor is configured to:
- process a segment of the response signal to determine a corresponding set of one or more formant frequencies for the vocal tract; and
  
  further process the set of formant frequencies to identify a phoneme corresponding to the segment.
- View Dependent Claims (18, 19)
- - 18. The apparatus of claim 17, further comprising a random-number generator, wherein:
    - the excitation signal comprises a sequence of excitation pulses that corresponds to a sequence of random numbers generated by the random-number generator;
      
      the processor correlates the segment of the response signal and a corresponding segment of the sequence of random numbers to determine a reflected impulse response of the vocal tract; and
      
      the processor determines the set of formant frequencies based on the reflected impulse response.
  - 19. The apparatus of claim 18, wherein the processor determines an impedance profile of the vocal tract based on the reflected impulse response and then determines the set of formant frequencies based on the impedance profile, wherein, for the determination of the impedance profile, the processor is configured to:
    - employ a model of the vocal tract according to which the vocal tract comprises a plurality of constant-impedance sections;
      
      decompose the reflected impulse response into components corresponding to wave reflections from impedance discontinuities between adjacent constant-impedance sections; and
      
      determine the impedance profile based on said decomposition.

20. A method of synthesizing speech, comprising:
- directing an excitation signal generated by a speaker into a vocal tract;
  
  detecting, with a microphone, a vocal-tract response signal corresponding to the excitation signal;
  
  processing a segment of the response signal to determine a corresponding set of one or more formant frequencies for the vocal tract; and
  
  processing the set of formant frequencies to identify a phoneme corresponding to the segment.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Alcatel-Lucent SA (Nokia Corporation)
Original Assignee
Alcatel-Lucent USA, Inc. (Nokia Corporation)
Inventors
Moeller, Lothar Benedikt, Harman, Dale D.

Application Number

US12/956,552
Publication Number

US 20120136660A1
Time in Patent Office

Days
Field of Search
US Class Current

704/254
CPC Class Codes

G10L 15/20   Speech recognition techniqu...

G10L 15/24   Speech recognition using no...

G10L 25/15   the extracted parameters be...

VOICE-ESTIMATION BASED ON REAL-TIME PROBING OF THE VOCAL TRACT

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

VOICE-ESTIMATION BASED ON REAL-TIME PROBING OF THE VOCAL TRACT

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links