Robust parameters for noisy speech recognition

US 20030182114A1
Filed: 11/01/2002
Published: 09/25/2003
Est. Priority Date: 05/04/2000
Status: Active Grant

First Claim

Patent Images

1. A method of automatic processing of noise-affected speech comprising at least the following steps:

capture and digitising of the speech in the form of at least one digitised signal (1), extraction of several time-based sequences or frames (15), corresponding to said signal, by means of an extraction system (10), decomposition of each frame (15) by means of an analysis system (20, 40) into at least two different frequency bands so as to obtain at least two first vectors of representative parameters (45) for each frame (15), one for each frequency band, and conversion, by means of converter systems (50), of the first vectors of representative parameters (45) into second vectors of parameters relatively insensitive to noise (55), each converter system (50) being associated with one frequency band and converting the first vector of representative parameters (45) associated with said same frequency band, and the learning of said converter systems (50) being achieved on the basis of a learning corpus which corresponds to a corpus of speech contaminated by noise (102).

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention relates to a method of automatic processing of noise-affected speech comprising at least the following steps:

capture and digitising of the speech in the form of at least one digitised signal (1),

extraction of several time-based sequences or frames (15), corresponding to said signal, by means of an extraction system (10),

decomposition of each frame (15) by means of an analysis system (20, 40) into at least two different frequency bands so as to obtain at least two first vectors of representative parameters (45) for each frame (15), one for each frequency band, and

conversion, by means of converter systems (50), of the first vectors of representative parameters (45) into second vectors of parameters relatively insensitive to noise (55), each converter system (50) being associated with one frequency band and converting the first vector of representative parameters (45) associated with said same frequency band, and

the learning of said converter systems (50) being achieved on the basis of a learning corpus which corresponds to a corpus of speech contaminated by noise (102).

19 Citations

View as Search Results

12 Claims

1. A method of automatic processing of noise-affected speech comprising at least the following steps:
- capture and digitising of the speech in the form of at least one digitised signal (1), extraction of several time-based sequences or frames (15), corresponding to said signal, by means of an extraction system (10), decomposition of each frame (15) by means of an analysis system (20, 40) into at least two different frequency bands so as to obtain at least two first vectors of representative parameters (45) for each frame (15), one for each frequency band, and conversion, by means of converter systems (50), of the first vectors of representative parameters (45) into second vectors of parameters relatively insensitive to noise (55), each converter system (50) being associated with one frequency band and converting the first vector of representative parameters (45) associated with said same frequency band, and the learning of said converter systems (50) being achieved on the basis of a learning corpus which corresponds to a corpus of speech contaminated by noise (102).
- View Dependent Claims (2, 3, 4, 5, 6, 10, 11, 12)
- - 2. The method according to claim 1, characterised in that it further comprises a step of concatenation of the second vectors of representative parameters which are relatively insensitive to noise (55), associated with the different frequency bands of the same frame (15) so as to have no more than one single third vector of concatenated parameters (56) for each frame (15) which is then used as input in an automatic speech-recognition system (60).
  - 3. The method according to claim 1 or 2, characterised in that the conversion, by means of converter systems (50), is achieved by linear transformation or by non-linear transformation.
  - 4. The method according to one of claims 1 to 3, characterised in that the converter systems (50) are artificial neuronal networks.
  - 5. The method according to claim 4, characterised in that the said artificial neuronal networks are of the multi-layer perceptron type and each comprises at least one hidden layer.
  - 6. The method according to claim 5, characterised in that the learning by the said artificial neuronal networks of the multi-layer perceptron type relies on targets corresponding to the basic lexical units for each frame of the learning corpus, the output vectors of the last hidden layer or layers of the said artificial neuronal networks being used as vectors of representative parameters which are relatively insensitive to the noise.
  - 10. Use of the method according to one of claims 1 to 6 and/or of the system according to one of claims 7 to 9 for speech recognition.
  - 11. Use of the method according to one of claims 1, 3 to 6 and/or of the system according to one of claims 7 or 8 for speech coding.
  - 12. Use of the method according to one of claims 1, 3 to 6 and/or of the system according to one of claims 7 or 8 for removing noise from speech.

7. An automatic speech-processing system comprising at least:
- an acquisition system for obtaining at least one digitised speech signal (1), an extraction system (10), for extracting several time-based sequences or frames (15) corresponding to said signal (1), means (20, 40) for decomposing each frame (15) into at least two different frequency bands so as to obtain at least two first vectors of representative parameters (45), one vector for each frequency band, and several converter systems (50), each converter system (50) being associated with one frequency band and making it possible to convert the first vector of representative parameters (45) associated with this same frequency band into a second vector of parameters which are relatively insensitive to noise (55), and the learning by the said converter systems (50) being achieved on the basis of a corpus of speech corrupted by noise (102).
- View Dependent Claims (8, 9)
- - 8. The automatic speech-processing system according to claim 7, characterised in that the converter systems (50) are artificial neuronal networks, preferably of the multi-layer perceptron type.
  - 9. The automatic speech-processing system according to claim 7 or claim 8, characterised in that it further comprises means allowing the concatenation of the second vectors of representative parameters which are relatively insensitive to noise (55), associated with different frequency bands of the same frame (15) so as to have no more than one single third vector of concatenated parameters (56) for each frame (15), said third vector then being used as input into an automatic speech-recognition system (60).

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Faculte Polytechnique De Mons
Original Assignee
Faculte Polytechnique De Mons
Inventors
Dupont, Stephane

Granted Patent

US 7,212,965 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/233
CPC Class Codes

G10L 15/16   using artificial neural net...

G10L 19/0212   using orthogonal transforma...

G10L 21/0208   Noise filtering

Robust parameters for noisy speech recognition

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

19 Citations

12 Claims

Specification

Solutions

Use Cases

Quick Links

Robust parameters for noisy speech recognition

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

19 Citations

12 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links