Symbiotic automatic speech recognition and vocoder

US 6,092,039 A
Filed: 10/31/1997
Issued: 07/18/2000
Est. Priority Date: 10/31/1997
Status: Expired due to Fees

First Claim

Patent Images

1. A method for automatic speech recognition (ASR) and vocoding (VC), comprising the steps of:

converting a first signal representing speech to a second signal having raw mel capstrum vector (MCV) and a third signal having raw pitch;

subtracting a calibration vector from said MCV to form a difference vector;

multiplying a calibration matrix with said difference vector to produce a recalibrated MCV;

recalibrating said raw pitch with a logarithmic function;

concatenating said recalibrated MCV with said recalibrated pitch to form a recalibrated vector;

compressing and quantizing said recalibrated vector to form a vector quantized signal; and

forwarding said vector quantized signal to a remote receiver for decoding said vector quantized signal received by the remote receiver to recover said speech.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The device and method of the invention receives a digital speech signal, which is processed by an Acoustic Processor to produce a Mel-Cepstrum Vector and Pitch. This is recalibrated and encoded. The encoded signal is transmitted over a narrow-band Channel, then decoded, split and recalibrated. From the split signals, one signal feeds a Statistical Processor which produces Recognized Text. Another signal feeds a Regenerator, which produces Regenerated Speech. The device and method according to the invention achieve simultaneously very perceptive Automatic Speech Recognition and high quality VoCoding, using Speech communicated or stored via a Channel with narrow-bandwidth; very perceptive Automatic Speech Recognition on a Client & Server system without a need to store or to communicate wide-bandwidth Speech signals; very perceptive Automatic Speech Recognition with Deferred Review and Editing without storage of wide-bandwidth Speech signals; better feedback in a system for Automatic Speech Recognition particularly for Deferred Automatic Speech Recognition; and good usability for unified Automatic Speech Recognition and VoCoding.

133 Citations

13 Claims

1. A method for automatic speech recognition (ASR) and vocoding (VC), comprising the steps of:
- converting a first signal representing speech to a second signal having raw mel capstrum vector (MCV) and a third signal having raw pitch;
  
  subtracting a calibration vector from said MCV to form a difference vector;
  
  multiplying a calibration matrix with said difference vector to produce a recalibrated MCV;
  
  recalibrating said raw pitch with a logarithmic function;
  
  concatenating said recalibrated MCV with said recalibrated pitch to form a recalibrated vector;
  
  compressing and quantizing said recalibrated vector to form a vector quantized signal; and
  
  forwarding said vector quantized signal to a remote receiver for decoding said vector quantized signal received by the remote receiver to recover said speech.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method according to claim 1, wherein said compressing and quantifying step includes slopes compression.
  - 3. The method according to claim 1, wherein said compressing and quantifying step includes silence compression.
  - 4. The method according to claim 1, wherein said step of compressing and quantifying includes use of a silence detector and a gate.
  - 5. The method according to claim 1, wherein said step of compressing and quantifying includes duration compression.
  - 6. The method according to claim 1, wherein said step of compressing and quantifying includes variable detail compression.

7. A method of decoding vector quantized data representing speech, comprising the steps of:
- dequantizing and decompressing said vector quantized data including acoustic data substantially independent of phonemic information into a mel-capstrum vector (MCV), a recalibrated MCV, and pitch;
  
  adding said MCV with a calibration vector;
  
  statistically processing said sum vector into text; and
  
  regenerating said calibration MCV by frequency domain transformation into speech.
- View Dependent Claims (8, 9, 10, 11)
- - 8. The method according to claim 7, wherein said decompressing step includes variable detail decompression.
  - 9. The method according to claim 7, wherein said decompressing step includes duration decompression.
  - 10. The method according to claim 7, wherein said decompressing step includes silence decompression.
  - 11. The method according to claim 7, wherein said decompressing step includes slope decompression.

12. A program storage device having stored program instructions executable by a computer to perform method steps for automatic speech recognition (ASR) and vocoding (VC), the method steps comprising:
- converting a first signal representing speech to a second signal having raw mel capstrum vector (MCV) and a third signal having raw pitch;
  
  subtracting a calibration vector from said MCV to form a difference vector;
  
  multiplying a calibration matrix with said difference vector to produce a recalibrated MCV;
  
  recalibrating said raw pitch with a logorithmic function;
  
  concatenating said recalibrated MCV with said recalibrated pitch to form a recalibrated vector;
  
  compressing and quantizing said recalibrated vector to form a vector quantized signal; and
  
  forwarding said vector quantized signal to a remote receiver for decoding said vector quantized signal received by the remote receiver to recover said speech.

13. A program storage device having stored program instructions executable by a computer to perform method steps for decoding vector quantized data representing speech, the method comprising the steps of:
- dequantizing and decompressing said vector quantized data into a mel-capstrum vector (MCV), a recalibrated MCV, and pitch;
  
  adding said MCV with a calibration vector;
  
  statistically processing said sum vector into text; and
  
  regenerating said calibration MCV by frequency domain transformation into speech.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Zingher, Arthur Richard
Primary Examiner(s)
Hudspeth, David R.
Assistant Examiner(s)
Azad, Abul K.

Application Number

US08/960,535
Time in Patent Office

991 Days
Field of Search

704/203, 704/204, 704/231-235, 704/239, 704/246, 704/221, 704/222, 704/223
US Class Current

704/221
CPC Class Codes

G10L 15/02 Feature extraction for spee...

G10L 19/00 Speech or audio signals ana...

Symbiotic automatic speech recognition and vocoder

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

133 Citations

13 Claims

Specification

Solutions

Use Cases

Quick Links

Symbiotic automatic speech recognition and vocoder

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

133 Citations

13 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links