System and method for recognizing a user voice command in noisy environment

US 9,318,103 B2
Filed: 02/21/2013
Issued: 04/19/2016
Est. Priority Date: 08/24/2010
Status: Active Grant

First Claim

Patent Images

1. An automatic speech recognition system for recognizing a user voice command in a noisy environment, comprising:

matching means for matching elements retrieved from speech units forming the command with templates stored in a template library;

processing means for determining a sequence of templates that minimizes a distance between the elements and the templates, wherein the templates are posterior templates, the elements retrieved from the speech units are posterior vectors, and the posterior templates and the posterior vectors are generated with a MultiLayer Perceptron;

calculating means for automatically selecting a subset of the posterior templates, the selection of the subset of the posterior templates including;

(i) determining Gabriel or relative neighbors of the selected subset of the posterior templates by calculating a matrix of distances between all of the posterior templates,(ii) visiting each template of the subset of posterior templates,(iii) marking a template of the subset of the posterior templates if all of its neighbours are of a same phone class as the template; and

(iv) deleting all marked posterior templates, wherein the remaining posterior templates constitute the selected subset of the posterior templates; and

a dynamic time warping (DTW) decoder for matching the posterior vectors with the selected subset of posterior templates, whereinthe DTW decoder receives input, the input comprising a sequence of posterior vectors to be recognized, a posterior template library, a dictionary and optionally a grammar, andthe DTW decoder outputs one or more sequences of recognized words, time information and confidence measures.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An automatic speech recognition system for recognizing a user voice command in noisy environment, including: matching means for matching elements retrieved from speech units forming said command with templates in a template library; characterized by processing means including a MultiLayer Perceptron for computing posterior templates (P(O^template(q))) stored as said templates in said template library; means for retrieving posterior vectors (P(O^test(q))) from said speech units, said posterior vectors being used as said elements. The present invention relates also to a method for recognizing a user voice command in noisy environments.

30 Citations

View as Search Results

17 Claims

1. An automatic speech recognition system for recognizing a user voice command in a noisy environment, comprising:
- matching means for matching elements retrieved from speech units forming the command with templates stored in a template library;
  
  processing means for determining a sequence of templates that minimizes a distance between the elements and the templates, wherein the templates are posterior templates, the elements retrieved from the speech units are posterior vectors, and the posterior templates and the posterior vectors are generated with a MultiLayer Perceptron;
  
  calculating means for automatically selecting a subset of the posterior templates, the selection of the subset of the posterior templates including;
  
  (i) determining Gabriel or relative neighbors of the selected subset of the posterior templates by calculating a matrix of distances between all of the posterior templates,(ii) visiting each template of the subset of posterior templates,(iii) marking a template of the subset of the posterior templates if all of its neighbours are of a same phone class as the template; and
  
  (iv) deleting all marked posterior templates, wherein the remaining posterior templates constitute the selected subset of the posterior templates; and
  
  a dynamic time warping (DTW) decoder for matching the posterior vectors with the selected subset of posterior templates, whereinthe DTW decoder receives input, the input comprising a sequence of posterior vectors to be recognized, a posterior template library, a dictionary and optionally a grammar, andthe DTW decoder outputs one or more sequences of recognized words, time information and confidence measures.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 2. The system of claim 1, further comprising a voice activity detector.
  - 3. The system of claim 1, wherein the MultiLayer Perceptron is multilingual.
  - 4. The system of claim 1, further comprising at least two MultiLayer Perceptrons, wherein each of the MultiLayer Perceptrons is configured for different specific languages.
  - 5. The system of claim 1, wherein the template library is a pre-existing template library generated from training templates spoken by another user.
  - 6. The system of claim 1, further comprising means for creating the template library from a pronunciation dictionary.
  - 7. The system of claim 6, wherein the matching of the DTW decoder is based on a Kullback-Leibler (KL)-divergence metric.
  - 8. The system of claim 1, further comprising means for automatically adapting the template library by activating, deactivating, adding, deleting or substituting the posterior templates.
  - 9. The system of claim 8, wherein the means for automatically adapting uses a feedback of user input on a user device.
  - 10. The system of claim 1, wherein the input of the DTW decoder comprises the grammar.
  - 11. The system of claim 1, further comprising voice activity detector means that can be selected and de-selected by the user.
  - 12. The system of claim 11, wherein the grammar is selected by the voice activity detector means.
  - 13. The system of claim 1, wherein the DTW decoder incorporates an insertion penalty, a scale factor and a filter silence.
  - 14. The system of claim 9, whereinthe user device is configured to allow a user to enter the voice commands, andthe system further comprises:
    - pre-processing means in the user device, adapted for pre-processing the entered voice commands;
      
      connection means for transmitting pre-processed signals, based on the pre-processed voice commands, to a central server in a bar, restaurant or hotel; and
      
      restaurant, bar or hotel management software for managing bar, restaurant or hotel orders entered by the user through the voice commands.

15. An automatic speech recognition method of recognizing a voice command spoken by a user in a noisy environment, the method comprising:
- matching elements, by matching means, retrieved from speech units forming the command with templates stored in a template library;
  
  determining, by processing means, a sequence of templates that minimizes a distance between the elements and the templates, wherein the templates are posterior templates and the elements retrieved from the speech units are posterior vectors, the posterior templates and the posterior vectors being generated with at least one MultiLayer Perceptron;
  
  selecting a subset of the posterior templates, the selection of the subset of the posterior templates including;
  
  (i) determining Gabriel or relative neighbours of the selected subset of posterior templates by calculating a matrix of distances between all of the posterior templates,(ii) visiting each template of the subset of posterior templates,(iii) marking a template of the subset of posterior templates if all of its neighbours are of a same phone class as the template, and(iv) deleting all marked posterior templates, wherein the remaining posterior templates constitute the selected subset of the posterior templates; and
  
  using a dynamic time warping (DTW) decoder configured to match the posterior vectors with the selected subset of posterior templates, whereinthe DTW decoder receives input, the input comprising a sequence of posterior vectors to be recognized, a posterior template library, a dictionary and optionally a grammar, andthe DTW decoder outputs one or more sequences of recognized words, time information and confidence measures.
- View Dependent Claims (16, 17)
- - 16. The method of claim 15, further comprising:
    - entering the voice commands corresponding to bar, restaurant or hotel orders in a user device;
      
      pre-processing the voice commands in a user device;
      
      transmitting pre-processed signals, based on the pre-processed voice commands, to a server;
      
      converting the pre-processed signals into text orders in the server;
      
      displaying the text orders; and
      
      communicating the orders to software and/or systems used by a bar, restaurant or hotel.
  - 17. The method of claim 15, further comprising:
    - recording continuously the voice command by means of an audio acquisitioning system,selecting a voice activity detector means,de-selecting the voice activity detector means, andprocessing the voice command at a time before selecting the voice activity detector means and at a time after de-selecting the voice activity detector means.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Veovox SA
Original Assignee
Veovox SA
Inventors
Dines, John, Carmona, Jorge, Masson, Olivier, Aradilla, Guillermo
Primary Examiner(s)
Desir, Pierre-Louis
Assistant Examiner(s)
Sharma, Neeraj

Application Number

US13/773,190
Publication Number

US 20130166279A1
Time in Patent Office

1,153 Days
Field of Search

704/8, 704/245, 704/275, 704/253, 704/270, 704/231, 704/207, 704/239, 704/251, 704/241, 379/88.01, 379/88.02
US Class Current

1/1
CPC Class Codes

G10L 15/02   Feature extraction for spee...

G10L 15/063   Training

G10L 15/12   using dynamic programming t...

G10L 15/16   using artificial neural net...

G10L 15/20   Speech recognition techniqu...

G10L 17/02   Preprocessing operations, e...

System and method for recognizing a user voice command in noisy environment

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

30 Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for recognizing a user voice command in noisy environment

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

30 Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links