System and method for recognizing a user voice command in noisy environment
First Claim
1. An automatic speech recognition system for recognizing a user voice command in a noisy environment, comprising:
- matching means for matching elements retrieved from speech units forming the command with templates stored in a template library;
processing means for determining a sequence of templates that minimizes a distance between the elements and the templates, wherein the templates are posterior templates, the elements retrieved from the speech units are posterior vectors, and the posterior templates and the posterior vectors are generated with a MultiLayer Perceptron;
calculating means for automatically selecting a subset of the posterior templates, the selection of the subset of the posterior templates including;
(i) determining Gabriel or relative neighbors of the selected subset of the posterior templates by calculating a matrix of distances between all of the posterior templates,(ii) visiting each template of the subset of posterior templates,(iii) marking a template of the subset of the posterior templates if all of its neighbours are of a same phone class as the template; and
(iv) deleting all marked posterior templates, wherein the remaining posterior templates constitute the selected subset of the posterior templates; and
a dynamic time warping (DTW) decoder for matching the posterior vectors with the selected subset of posterior templates, whereinthe DTW decoder receives input, the input comprising a sequence of posterior vectors to be recognized, a posterior template library, a dictionary and optionally a grammar, andthe DTW decoder outputs one or more sequences of recognized words, time information and confidence measures.
1 Assignment
0 Petitions
Accused Products
Abstract
An automatic speech recognition system for recognizing a user voice command in noisy environment, including: matching means for matching elements retrieved from speech units forming said command with templates in a template library; characterized by processing means including a MultiLayer Perceptron for computing posterior templates (P(Otemplate(q))) stored as said templates in said template library; means for retrieving posterior vectors (P(Otest(q))) from said speech units, said posterior vectors being used as said elements. The present invention relates also to a method for recognizing a user voice command in noisy environments.
30 Citations
17 Claims
-
1. An automatic speech recognition system for recognizing a user voice command in a noisy environment, comprising:
-
matching means for matching elements retrieved from speech units forming the command with templates stored in a template library; processing means for determining a sequence of templates that minimizes a distance between the elements and the templates, wherein the templates are posterior templates, the elements retrieved from the speech units are posterior vectors, and the posterior templates and the posterior vectors are generated with a MultiLayer Perceptron; calculating means for automatically selecting a subset of the posterior templates, the selection of the subset of the posterior templates including; (i) determining Gabriel or relative neighbors of the selected subset of the posterior templates by calculating a matrix of distances between all of the posterior templates, (ii) visiting each template of the subset of posterior templates, (iii) marking a template of the subset of the posterior templates if all of its neighbours are of a same phone class as the template; and (iv) deleting all marked posterior templates, wherein the remaining posterior templates constitute the selected subset of the posterior templates; and a dynamic time warping (DTW) decoder for matching the posterior vectors with the selected subset of posterior templates, wherein the DTW decoder receives input, the input comprising a sequence of posterior vectors to be recognized, a posterior template library, a dictionary and optionally a grammar, and the DTW decoder outputs one or more sequences of recognized words, time information and confidence measures. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. An automatic speech recognition method of recognizing a voice command spoken by a user in a noisy environment, the method comprising:
-
matching elements, by matching means, retrieved from speech units forming the command with templates stored in a template library; determining, by processing means, a sequence of templates that minimizes a distance between the elements and the templates, wherein the templates are posterior templates and the elements retrieved from the speech units are posterior vectors, the posterior templates and the posterior vectors being generated with at least one MultiLayer Perceptron; selecting a subset of the posterior templates, the selection of the subset of the posterior templates including; (i) determining Gabriel or relative neighbours of the selected subset of posterior templates by calculating a matrix of distances between all of the posterior templates, (ii) visiting each template of the subset of posterior templates, (iii) marking a template of the subset of posterior templates if all of its neighbours are of a same phone class as the template, and (iv) deleting all marked posterior templates, wherein the remaining posterior templates constitute the selected subset of the posterior templates; and using a dynamic time warping (DTW) decoder configured to match the posterior vectors with the selected subset of posterior templates, wherein the DTW decoder receives input, the input comprising a sequence of posterior vectors to be recognized, a posterior template library, a dictionary and optionally a grammar, and the DTW decoder outputs one or more sequences of recognized words, time information and confidence measures. - View Dependent Claims (16, 17)
-
Specification