Audio-visual codebook dependent cepstral normalization

US 7,319,955 B2
Filed: 11/29/2002
Issued: 01/15/2008
Est. Priority Date: 11/29/2002
Status: Active Grant

First Claim

Patent Images

1. An apparatus for enhancing speech for speech recognition, said apparatus comprising:

a first input medium which obtains noisy audio-visual features;

a second input medium which obtains noisy audio features related to the noisy audio-visual features; and

a cepstral speech function output arrangement for combining the first and second inputs to yield enhanced audio features that are re-combined with visual features to yield enhanced audio-visual features used for speech recognition.

View all claims

8 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An arrangement for yielding enhanced audio features towards the provision of enhanced audio-visual features for speech recognition. Input is provided in the form of noisy audio-visual features and noisy audio features related to the noisy audio-visual features.

20 Citations

View as Search Results

21 Claims

1. An apparatus for enhancing speech for speech recognition, said apparatus comprising:
- a first input medium which obtains noisy audio-visual features;
  
  a second input medium which obtains noisy audio features related to the noisy audio-visual features; and
  
  a cepstral speech function output arrangement for combining the first and second inputs to yield enhanced audio features that are re-combined with visual features to yield enhanced audio-visual features used for speech recognition.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The apparatus according to claim 1, wherein said arrangement for yielding enhanced audio features is adapted to yield estimated clean speech features.
  - 3. The apparatus according to claim 1, wherein said arrangement for yielding enhanced audio features comprises an arrangement for determining a posterior distribution based on the noisy audio-visual features.
  - 4. The apparatus according to claim 3, wherein said arrangement for determining a posterior distribution is adapted to determine a posterior distribution based additionally on Gaussian parameters which model a probability density function related to the noisy audio-visual features.
  - 5. The apparatus according to claim 1, wherein said arrangement for yielding enhanced audio features comprises an arrangement for estimating audio compensation codewords.
  - 6. The apparatus according to claim 1, wherein said arrangement for yielding enhanced audio features comprises an arrangement for determining the difference between the noisy audio features and modified noisy audio-visual features.
  - 7. The apparatus according to claim 1, wherein said arrangement for yielding enhanced audio features comprises:
    - an arrangement for determining a posterior distribution based on the noisy audio-visual features; and
      
      an arrangement for estimating audio compensation codewords.
  - 8. The apparatus according to claim 7, wherein said arrangement for yielding enhanced audio features comprises an arrangement for effecting a multiplication of the posterior distribution with the estimated audio compensation codewords.
  - 9. The apparatus according to claim 8, wherein said arrangement for yielding enhanced audio features comprises an arrangement for determining the difference between the noisy audio features and the multiplication of the posterior distribution with the estimated audio compensation codewords.
  - 10. The apparatus according to claim 1, wherein said first input medium is adapted to accept noisy audio-visual features which have resulted from the processing of normalized audio features and normalized videofeatures.

11. A method of enhancing speech for speech recognition, said method comprisingthe steps of:
- obtaining noisy audio-visual features;
  
  obtaining noisy audio features related to the noisy audio-visual features; and
  
  using a cepstral speech function operating on the noisy audio features and the noisy audio-visual features to yield enhanced audio features that are re-combined with visual features to yield enhanced audio-visual features used for speech recognition.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 12. The method according to claim 11, wherein said step of yielding enhanced audio features comprises yielding estimated clean speech features.
  - 13. The method according to claim 11, wherein said step of yielding enhanced audio features comprises determining a posterior distribution based on the noisy audio-visual features.
  - 14. The method according to claim 13, wherein said step of determining a posterior distribution comprises determining a posterior distribution based additionally on Gaussian parameters which model a probability density function related to the noisy audio-visual features.
  - 15. The method according to claim 11, wherein said step of yielding enhanced audio features comprises estimating audio compensation codewords.
  - 16. The method according to claim 11, wherein said step of yielding enhanced audio features comprises determining the difference between the noisy audio features and modified noisy audio-visual features.
  - 17. The method according to claim 11, wherein said step of yielding enhanced audio features comprises:
    - determining a posterior distribution based on the noisy audio-visual features; and
      
      estimating audio compensation codewords.
  - 18. The method according to claim 17, wherein said step of yielding enhanced audio features comprises effecting a multiplication of the posterior distribution with the estimated audio compensation codewords.
  - 19. The method according to claim 18, wherein said step of yielding enhanced audio features comprises determining the difference between the noisy audio features and the multiplication of the posterior distribution with the estimated audio compensation codewords.
  - 20. The method according to claim 11, wherein said step of obtaining noisy audio-visual features comprises obtaining noisy audio-visual features which have resulted from the processing of normalized audio features and normalized video features.

21. A program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for enhancing speech for speech recognition, said method comprising the steps of:
- obtaining noisy audio-visual features;
  
  obtaining noisy audio features related to the noisy audio-visual features; and
  
  using a cepstral speech function operating on the noisy audio features and the noisy audio-visual features to yield enhanced audio features that are re-combined with visual features to yield enhanced audio-visual features used for speech recognition.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cerence Operating Company (Cerence Inc.)
Original Assignee
International Business Machines Corporation
Inventors
Deligne, Sabine, Potamianos, Gerasimos, Neti, Chalapathy V.
Primary Examiner(s)
Hudspeth; David
Assistant Examiner(s)
ALBERTALLI, BRIAN LOUIS

Application Number

US10/307,164
Publication Number

US 20040107098A1
Time in Patent Office

1,873 Days
Field of Search

None
US Class Current

704/233
CPC Class Codes

G10L 15/20 Speech recognition techniqu...

G10L 15/24 Speech recognition using no...

Audio-visual codebook dependent cepstral normalization

First Claim

8 Assignments

0 Petitions

Accused Products

Abstract

20 Citations

21 Claims

Specification

Use Cases

Quick Links

Others

Audio-visual codebook dependent cepstral normalization

First Claim

8 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

20 Citations

21 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others