Method and system for processing parallel context dependent speech recognition results from a single utterance utilizing a context database

US 9,117,453 B2
Filed: 12/30/2010
Issued: 08/25/2015
Est. Priority Date: 12/31/2009
Status: Active Grant

First Claim

Patent Images

1. A system comprising:

a directed-dialog-processor server having a directed-dialog-processor application executing thereon;

a speech-recognition-engine server having a plurality of parallel-operable speech-recognition-engine applications executing thereon;

wherein the plurality of parallel-operable speech-recognition-engine applications each provide a different speech-recognition capability;

a context database;

a multiple-recognition-processor server in data communication with the directed-dialog-processor server, the speech-recognition-engine server, and the context database and having a multiple-recognition-processor application executing thereon; and

wherein the multiple-recognition-processor server is operable, via the multiple-recognition-processor application, to;

receive context information and a forwarded caller response from the directed-dialog-processor application;

select, using the context information, a set of parallel-operable speech-recognition-engine applications from the plurality of parallel-operable speech-recognition-engine applications;

combine the context information with additional context information from the context database to form modified context information;

forward to each speech-recognition-engine application in the selected set the modified context information, the forwarded caller response, and a request to perform speech recognition of the forwarded caller response;

receive from each speech-recognition-engine application in the selected set an n-best list comprising at least one confidence-score value and at least one word-score value;

wherein the at least one confidence-score value and the at least one word-score value in each n-best list are modified by a weight-multiplier value based on the context information provided by the directed-dialog-processor application, thereby creating a modified n-best list;

wherein each modified n-best list is combined into a single, sorted combined n-best list; and

wherein the at least one confidence-score value and the at least one word-score value of the sorted combined n-best list are modified by determining presence of phrases and words of the sorted combined n-best list in the context database.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method of and system for accurately determining a caller response by processing speech-recognition results and returning that result to a directed-dialog application for further interaction with the caller. Multiple speech-recognition engines are provided that process the caller response in parallel. Returned speech-recognition results comprising confidence-score values and word-score values from each of the speech-recognition engines may be modified based on context information provided by the directed-dialog application and grammars associated with each speech-recognition engine. A context database is used to further reduce or add weight to confidence-score values and word-score values, remove phrases and/or words, and add phrases and/or words to the speech-recognition engine results. In situations where a predefined threshold-confidence-score value is not exceeded, a new dynamic grammar may be created. A set of n-best hypotheses of what the caller uttered is returned to the directed-dialog application.

Citations

18 Claims

1. A system comprising:
- a directed-dialog-processor server having a directed-dialog-processor application executing thereon;
  
  a speech-recognition-engine server having a plurality of parallel-operable speech-recognition-engine applications executing thereon;
  
  wherein the plurality of parallel-operable speech-recognition-engine applications each provide a different speech-recognition capability;
  
  a context database;
  
  a multiple-recognition-processor server in data communication with the directed-dialog-processor server, the speech-recognition-engine server, and the context database and having a multiple-recognition-processor application executing thereon; and
  
  wherein the multiple-recognition-processor server is operable, via the multiple-recognition-processor application, to;
  
  receive context information and a forwarded caller response from the directed-dialog-processor application;
  
  select, using the context information, a set of parallel-operable speech-recognition-engine applications from the plurality of parallel-operable speech-recognition-engine applications;
  
  combine the context information with additional context information from the context database to form modified context information;
  
  forward to each speech-recognition-engine application in the selected set the modified context information, the forwarded caller response, and a request to perform speech recognition of the forwarded caller response;
  
  receive from each speech-recognition-engine application in the selected set an n-best list comprising at least one confidence-score value and at least one word-score value;
  
  wherein the at least one confidence-score value and the at least one word-score value in each n-best list are modified by a weight-multiplier value based on the context information provided by the directed-dialog-processor application, thereby creating a modified n-best list;
  
  wherein each modified n-best list is combined into a single, sorted combined n-best list; and
  
  wherein the at least one confidence-score value and the at least one word-score value of the sorted combined n-best list are modified by determining presence of phrases and words of the sorted combined n-best list in the context database.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The system of claim 1, wherein the sorted combined n-best list is re-sorted following modification of the at least one confidence-score value and the at least one word-score value.
  - 3. The system of claim 2, wherein, responsive to a determination that a confidence-score value of the at least one confidence-score value in the sorted combined n-best list exceeds a predefined threshold confidence-score value, the multiple-recognition-processor server sets an acceptable status indicator to a value instructing the directed-dialog-processor server to accept the entry in the n-best list with the highest confidence-score value and forwards to the directed-dialog-processor the n-best list and the acceptance status indicator.
  - 4. The system of claim 1, wherein, responsive to a determination that a confidence-score value of the at least one confidence-score value in the sorted combined n-best list does not exceed a predefined threshold confidence-score value, the multiple-recognition-processor server generates a dynamic grammar.
  - 5. The system of claim 4, wherein:
    - the multiple-recognition-processor server forwards, to each selected speech-recognition-engine application, the modified context information, the forwarded caller response, and a request to perform a speech-recognition of the forwarded caller response; and
      
      the modified context information comprises the dynamic grammar.
  - 6. The system of claim 1, wherein the selection of the set of parallel-operable speech-recognition-engine applications from the plurality of speech-recognition-engine applications comprises analyzing configuration files in data communication with the multiple-recognition-processor server.

7. A method comprising:
- (a) providing a processor;
  
  (b) providing a memory interoperably coupled to the processor and having computer-readable processor instructions stored thereon;
  
  (c) using the processor and the memory in combination with the computer-readable processor instructions to perform at least one of steps (d)-(i);
  
  (d) receiving context information and a forwarded caller response from a directed-dialog-processor application executing on a directed-dialog-processor server;
  
  (e) selecting, using the context information, a set of parallel-operable speech-recognition-engine applications from a plurality of parallel-operable speech-recognition-engine applications executing on a speech-recognition-engine server;
  
  wherein the plurality of parallel-operable speech-recognition-engine applications each provide a different speech-recognition capability;
  
  (f) combining the context information received in step (d) and additional context information present in a context database, thereby forming modified context information;
  
  (g) forwarding modified context information, the forwarded caller response, and a request to perform speech recognition of the forwarded caller response to each speech-recognition-engine application selected in step (e);
  
  (h) receiving from each speech-recognition-engine application of the set of parallel-operable speech-recognition-engine applications an n-best list comprising at least one confidence-score value and at least one word-score value;
  
  (i) responsive to step (h), modifying the at least one confidence-score value and the at least one word-score value in each n-best list by a weight-multiplier value based on the context information provided by the directed-dialog-processor application, thereby creating a modified n-best list;
  
  wherein each modified n-best list is combined into a single sorted combined n-best list; and
  
  wherein the at least one confidence-score value and the at least one word-score value of the sorted combined n-best list are modified by determining presence of phrases and words of the sorted combined n-best list in the context database.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. The method of claim 7, comprising:
    - responsive to the modification of the at least one confidence-score value and the at least one word-score value, re-sorting the sorted combined n-best list.
  - 9. The method of claim 8, comprising:
    - responsive to a determination that a confidence-score value of the at least one confidence-score value in the sorted combined n-best list exceeds a predefined threshold-confidence-score value, setting the value of an acceptable status indicator to a value instructing the directed-dialog-processor server to accept the entry in the n-best list with the highest confidence-score value and forwarding to the directed-dialog-processor the n-best list and the acceptance status indicator.
  - 10. The method of claim 7, comprising:
    - responsive to a determination that a confidence-score value of the at least one confidence-score value in the sorted combined n-best list does not exceed a predefined threshold-confidence-score value, generating a dynamic grammar.
  - 11. The method of claim 10, comprising:
    - responsive to the generation of a dynamic grammar, forwarding, to each selected speech-recognition-engine application, the modified context information, the forwarded caller response, and a request to perform a speech-recognition of the forwarded caller response; and
      
      wherein the modified context information comprises the dynamic grammar.
  - 12. The method of claim 7, wherein step (e) is performed after analyzing configuration files in data communication with the speech-recognition-engine server.

13. A computer-program product comprising a non-transitory computer-usable medium having computer-readable processor instructions embodied therein, the computer-readable processor instructions adapted to be executed to implement a method comprising:
- (a) providing a processor;
  
  (b) providing a memory interoperably coupled to the processor and having computer-readable processor instructions stored thereon;
  
  (c) using the processor and the memory in combination to perform at least one of steps (d)-(i);
  
  (d) receiving context information and a forwarded caller response from a directed-dialog-processor application executing on a directed-dialog-processor server;
  
  (e) selecting, using the context information, a set of parallel-operable speech-recognition-engine applications from a plurality of parallel-operable speech-recognition-engine applications executing on a speech-recognition-engine server;
  
  wherein the plurality of parallel-operable speech-recognition-engine applications each provide a different speech-recognition capability;
  
  (f) combining the context information received in step (d) and additional context information present in a context database, thereby forming modified context information;
  
  (g) forwarding modified context information, the forwarded caller response, and a request to perform speech recognition of the forwarded caller response to each speech-recognition-engine application selected in step (e);
  
  (h) receiving from each speech-recognition-engine application of the set of parallel-operable speech-recognition-engine applications an n-best list comprising at least one confidence-score value and at least one word-score value;
  
  (i) responsive to step (h), modifying the at least one confidence-score value and the at least one word-score value in each n-best list by a weight-multiplier value based on the context information provided by the directed-dialog-processor application, thereby creating a modified n-best list;
  
  wherein each modified n-best list is combined into a single sorted combined n-best list; and
  
  wherein the at least one confidence-score value and the at least one word-score value of the sorted combined n-best list are modified by determining presence of phrases and words of the sorted combined n-best list in the context database.
- View Dependent Claims (14, 15, 16, 17, 18)
- - 14. The computer-program product of claim 13, the method comprising:
    - responsive to the modification of the at least one confidence-score value and the at least one word-score value, re-sorting the sorted combined n-best list.
  - 15. The computer-program product of claim 14, the method comprising:
    - responsive to a determination that a confidence-score value of the at least one confidence-score value in the sorted combined n-best list exceeds a predefined threshold-confidence-score value, setting the value of an acceptable status indicator to a value instructing the directed-dialog-processor server to accept the entry in the n-best list with the highest confidence-score value and forwarding to the directed-dialog-processor the n-best list and the acceptance status indicator.
  - 16. The computer-program product of claim 13, the method comprising:
    - responsive to a determination that a confidence-score value of the at least one confidence-score value in the sorted combined n-best list does not exceed a predefined threshold-confidence-score value, generating a dynamic grammar.
  - 17. The computer-program product of claim 16, the method comprising:
    - responsive to the generation of a dynamic grammar, forwarding, to each selected speech-recognition-engine application, the modified context information, the forwarded caller response, and a request to perform a speech-recognition of the forwarded caller response; and
      
      wherein the modified context information comprises the dynamic grammar.
  - 18. The computer-program product of claim 13, wherein step (e) is performed after analyzing configuration files in data communication with the speech-recognition-engine server.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Volt Delta Resources LLC (NewNet Communication Technologies LLC)
Original Assignee
Volt Delta Resources LLC (NewNet Communication Technologies LLC)
Inventors
Bielby, Gregory J.
Primary Examiner(s)
Kazeminezhad, Farzad

Application Number

US12/982,146
Publication Number

US 20110161077A1
Time in Patent Office

1,699 Days
Field of Search

704/277, 704/235
US Class Current

1/1
CPC Class Codes

G06F 40/58   Use of machine translation,...

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

G10L 15/32   Multiple recognisers used i...

G10L 17/02   Preprocessing operations, e...

G10L 2015/228   of application context

Method and system for processing parallel context dependent speech recognition results from a single utterance utilizing a context database

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for processing parallel context dependent speech recognition results from a single utterance utilizing a context database

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links