System and method of performing speech recognition based on a user identifier

US 7,451,081 B1
Filed: 03/13/2007
Issued: 11/11/2008
Est. Priority Date: 03/20/2001
Status: Expired due to Term

First Claim

Patent Images

1. A method of performing speech recognition comprising:

receiving a voice request from a speaker;

receiving a data field selection by the speaker;

applying one of a plurality of language models to the received voice request for speech recognition based on the selected data field, wherein the speech recognition uses a different language model for each data field selected by the speaker;

determining an identity of the speaker based, at least in part, on a user identifier;

repeatedly determining parameters of a background model based on sampled information collected at a periodic time interval during the received voice request;

determining parameters of a transducer model; and

adapting a speech recognition model based on user-specific transformations corresponding to the determined identity of the speaker and on at least one of the background model or the transducer model.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Speech recognition models are dynamically re-configurable based on user information, application information, background information such as background noise and transducer information such as transducer response characteristics to provide users with alternate input modes to keyboard text entry. Word recognition lattices are generated for each data field of an application and dynamically concatenated into a single word recognition lattice. A language model is applied to the concatenated word recognition lattice to determine the relationships between the word recognition lattices and repeated until the generated word recognition lattices are acceptable or differ from a predetermined value only by a threshold amount. These techniques of dynamic re-configurable speech recognition provide for deployment of speech recognition on small devices such as mobile phones and personal digital assistants as well environments such as office, home or vehicle while maintaining the accuracy of the speech recognition.

Citations

20 Claims

1. A method of performing speech recognition comprising:
- receiving a voice request from a speaker;
  
  receiving a data field selection by the speaker;
  
  applying one of a plurality of language models to the received voice request for speech recognition based on the selected data field, wherein the speech recognition uses a different language model for each data field selected by the speaker;
  
  determining an identity of the speaker based, at least in part, on a user identifier;
  
  repeatedly determining parameters of a background model based on sampled information collected at a periodic time interval during the received voice request;
  
  determining parameters of a transducer model; and
  
  adapting a speech recognition model based on user-specific transformations corresponding to the determined identity of the speaker and on at least one of the background model or the transducer model.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, further comprising:
    - translating the voice request into an HTTP protocol request.
  - 3. The method of claim 2, further comprising:
    - forwarding information from a database based on the HTTP protocol request to a dialog server; and
      
      generating from a dialog server a response to the voice request.
  - 4. The method of claim 3, wherein the response is one of a voice response, a text response, a tactile response and/or a Braille response.
  - 5. The method of claim 1, further comprising:
    - re-scoring automatic speech recognition using the speech recognition model by;
      
      generating word lattices representative of speech utterances in the received voice request;
      
      concatenating the word lattices into a single concatenated lattice; and
      
      applying at least one language model to the single concatenated lattice in order to determine word lattice inter-relationships;
      
      determining information in the received voice request based on the re-scored results of the speech recognition model; and
      
      adjusting the periodic time interval based, at least in part, on determined changes in the sampled information.
  - 6. The method of claim 5, further comprising:
    - generating a confidence score to determine whether the generated word lattices are acceptable.
  - 7. The method of claim 6, wherein:
    - the parameters of the background model are determined based on a first sample period;
      
      the parameters of the transducer model are determined based on a second sample period; and
      
      the confidence score is compared to a predetermined value in order to determine whether to perform the automatic speech recognition process again.
  - 8. The method of claim 6, further comprising:
    - saving at least one of the parameters of the background model and the parameters of the transducer model.

9. A system for performing speech recognition, the system comprising:
- a module configured to receive a voice request from a speaker;
  
  a module configured to receive a data field selection by the speaker;
  
  a module configured to apply one of a plurality of language models to the received voice request for speech recognition based on the selected data field, wherein the speech recognition uses a different language model for each data field selected by the speaker;
  
  a module configured to determine an identity of the speaker based, at least in part, on a user identifier;
  
  a module configured to determine parameters of a background model repeatedly based on sampled information collected at a periodic time interval during the received voice request;
  
  a module configured to determine parameters of a transducer model; and
  
  a module configured to adapt a speech recognition model based on user-specific transformations corresponding to the determined identity of the speaker and on at least one of the background model or the transducer model.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- - 10. The system of claim 9, further comprising:
    - a module configured to translate the voice request into an HTTP protocol request.
  - 11. The system of claim 9, further comprising:
    - a module configured to forward information from a database based on the HTTP protocol request to a dialog server; and
      
      a module configured to generate from a dialog server a response to the voice request.
  - 12. The system of claim 11, wherein the response is one of a voice response, a text response, a tactile response and/or a Braille response.
  - 13. The system of claim 9, further comprising:
    - a module configured to re-score automatic speech recognition using the speech recognition model, the module configured to re-score automatic speech recognition, further comprising;
      
      a module configured to generate word lattices representative of speech utterances in the received voice request,a module configured to concatenate the word lattices into a single concatenated lattice, anda module configured to apply at least one language model to the single concatenated lattice in order to determine word lattice inter-relationships;
      
      a module configured to determine information in the received voice request based on the re-scored results of the speech recognition model; and
      
      a module configured to adjust the periodic time interval based, at least in part, on determined changes in the sampled information.
  - 14. The system of claim 13, further comprising:
    - a module configured to generate a confidence score to determine whether the generated word lattices are acceptable.
  - 15. The system of claim 14, wherein:
    - the parameters of the background model are determined based on a first sample period;
      
      the parameters of the transducer model are determined based on a second sample period; and
      
      the confidence score is compared to a predetermined value in order to determine whether to perform the automatic speech recognition process again.
  - 16. The system of claim 14, further comprising:
    - a module configured to save at least one of the parameters of the background model and the parameters of the transducer model.

17. A tangible computer readable medium storing a computer program having instructions for controlling a processor of a computer device to perform speech recognition, the steps comprising:
- receiving a voice request from a speaker;
  
  receiving a data field selection by the speaker;
  
  applying one of a plurality of language models to the received voice request for speech recognition based on the selected data field, wherein the speech recognition uses a different language model for each data field selected by the speaker;
  
  determining an identity of the speaker based, at least in part, on a user identifier;
  
  determining parameters of a background model based on sampled information collected at a periodic time interval during the received voice request;
  
  instructions for controlling the processor to determine parameters of a transducer model; and
  
  instructions for controlling the processor to adapt a speech recognition model based on user-specific transformations corresponding to the determined identity of the speaker and on at least one of the background model or the transducer model.
- View Dependent Claims (18, 19, 20)
- - 18. The tangible computer readable medium of claim 17, the steps further comprising:
    - translating the voice request into an HTTP protocol request.
  - 19. The tangible computer readable medium of claim 17, wherein the steps further comprise:
    - forwarding information from a database based on the HTTP protocol request to a dialog server; and
      
      generating a response to the voice request.
  - 20. The tangible computer readable medium of claim 19, wherein the response is one of a voice response, a text response, a tactile response and/or a Braille response.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
AT&T Corporation (AT&T, Inc.)
Inventors
Gajic, Bojana, Narayanan, Shrikanth Sambasivan, Parthasarathy, Sarangarajan, Rose, Richard Cameron, Rosenberg, Aaron Edward
Primary Examiner(s)
Opsasnick, Michael N

Application Number

US11/685,456
Time in Patent Office

609 Days
Field of Search

704/231, 704/233, 704/243, 704/244
US Class Current

704/231
CPC Class Codes

G10L 15/07 to the speaker

G10L 15/20 Speech recognition techniqu...

System and method of performing speech recognition based on a user identifier

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

System and method of performing speech recognition based on a user identifier

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links