Systems and methods for dynamic re-configurable speech recognition

US 7,209,880 B1
Filed: 03/06/2002
Issued: 04/24/2007
Est. Priority Date: 03/20/2001
Status: Active Grant

First Claim

Patent Images

1. A method of dynamic re-configurable speech recognition comprising:

determining an identity of a speaker based, at least in part, on a user identifier;

repeatedly determining parameters of a background model based on sampled information collected at a periodic time interval during a received voice request;

determining parameters of a transducer model;

adapting a speech recognition model based on user-specific transformations corresponding to the determined identity of the speaker and on at least one of the background model or the transducer model;

applying one of a plurality of language models to the received voice request for speech recognition based on a data field selected by the speaker;

re-scoring automatic speech recognition using the speech recognition model comprising;

generating word lattices representative of speech utterances in the received voice request,concatenating the word lattices into a single concatenated lattice,applying at least one language model to the single concatenated lattice in order to determine word lattice inter-relationships;

determining information in the received voice request based on the re-scored results of the speech recognition model; and

adjusting the periodic time interval based, at least in part, on determined changes in the sampled information.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Speech recognition models are dynamically re-configurable based on user information, application information, background information such as background noise and transducer information such as transducer response characteristics to provide users with alternate input modes to keyboard text entry. Word recognition lattices are generated for each data field of an application and dynamically concatenated into a single word recognition lattice. A language model is applied to the concatenated word recognition lattice to determine the relationships between the word recognition lattices and repeated until the generated word recognition lattices are acceptable or differ from a predetermined value only by a threshold amount. These techniques of dynamic re-configurable speech recognition provide for deployment of speech recognition on small devices such as mobile phones and personal digital assistants as well environments such as office, home or vehicle while maintaining the accuracy of the speech recognition.

92 Citations

View as Search Results

22 Claims

1. A method of dynamic re-configurable speech recognition comprising:
- determining an identity of a speaker based, at least in part, on a user identifier;
  
  repeatedly determining parameters of a background model based on sampled information collected at a periodic time interval during a received voice request;
  
  determining parameters of a transducer model;
  
  adapting a speech recognition model based on user-specific transformations corresponding to the determined identity of the speaker and on at least one of the background model or the transducer model;
  
  applying one of a plurality of language models to the received voice request for speech recognition based on a data field selected by the speaker;
  
  re-scoring automatic speech recognition using the speech recognition model comprising;
  
  generating word lattices representative of speech utterances in the received voice request,concatenating the word lattices into a single concatenated lattice,applying at least one language model to the single concatenated lattice in order to determine word lattice inter-relationships;
  
  determining information in the received voice request based on the re-scored results of the speech recognition model; and
  
  adjusting the periodic time interval based, at least in part, on determined changes in the sampled information.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, further comprising:
    - generating a confidence score to determine whether the generated word lattices are acceptable.
  - 3. The method of claim 2, wherein:
    - the parameters of the background model are determined based on a first sample period;
      
      the parameters of the transducer model are determined based on a second sample period; and
      
      the confidence score is compared to a predetermined value in order to determine whether to perform the automatic speech recognition process again.
  - 4. The method of claim 2, further comprising:
    - saving at least one of the parameters of the background model and the parameters of the transducer model.
  - 5. The method of claim 1, further comprising:
    - repeatedly determining the parameters of the transducer model.
  - 6. The method of claim 1, wherein the user identifier comprises a calling phone number.
  - 7. The method of claim 1, wherein the user identifier is based on a plurality of rules associated with a telephone and at least one of a time of day or a day of a week.

8. A system for dynamic re-configurable speech recognition comprising:
- a background model estimation circuit for repeatedly determining a background model at a periodic time interval during a voice request based, at least in part, on estimated background parameters based on collected sampled information;
  
  a transducer model estimation circuit for determining a transducer model of the voice request based, at least in part, on estimated transducer parameters;
  
  a background model adaptation circuit and a transducer model adaptation circuit for determining an adapted speech recognition model based on a speech recognition model and at least one of the background model or the transducer model;
  
  a speech recognition circuit for recognizing speech and generating a speech lattice for each of a plurality of data fields for which a user provides voice input, the speech recognition circuit being arranged to use a different language model for each of the plurality of data fields;
  
  a lattice concatenation circuit that concatenates at least two speech lattices based on speech utterances in the received voice request into a single lattice; and
  
  a controller that applies at least one language model to the single concatenated lattice to determine relationships between the lattices, whereinthe controller is adapted to adjust the periodic time interval based, at least in part, on changes in the collected sampled information, andthe controller is adapted to determine an identity of a speaker based, at least in part on a user identifier and to apply user-specific transformations, corresponding to the identity of the speaker, to the speech recognition model.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The system of claim 8, wherein the controller generates a confidence score after applying the speech recognition model to determine whether the lattices are acceptable.
  - 10. The system of claim 9, wherein,the controller is configured to compare the confidence score to a predetermined value, andthe controller is further configured to repeat automatic speech recognition of the voice request based, at least in part, on the comparing.
  - 11. The system of claim 9, wherein the controller saves at least one of the background model or the transducer model into storage, and wherein the adapted speech recognition model is based on at least one of the background model or the transducer model.
  - 12. The system of claim 8, wherein the transducer model estimation circuit is configured to repeatedly determine the transducer model at the periodic time interval.
  - 13. The system of claim 8, wherein the user identifier comprises a calling phone number.
  - 14. The system of claim 8, wherein the user identifier is based on a plurality of rules associated with a telephone and at least one of a time of day or a day of a week.

15. A computer readable storage medium comprising:
- computer-readable program code usable to program a computer to perform a method for dynamic re-configurable speech recognition, the method comprising;
  
  determining an identity of a speaker based, at least in part, on a user identifier;
  
  determining parameters of a background model at a periodic time during a voice request;
  
  determining parameters of a transducer model;
  
  adapting a speech recognition model based on user-specific transformations corresponding to the determined identity of the speaker and on at least one of the background model or the transducer model;
  
  applying one of a plurality of language models to the received voice request for speech recognition based on a data field selected by the speaker;
  
  re-scoring automatic speech recognition using the speech recognition model, comprising;
  
  generating word lattices representative of speech utterances in the received voice request,concatenating the word lattices into a single concatenated lattice,applying at least one language model to the single concatenated latticein order to determine word lattice inter-relationships;
  
  determining information in the received voice request based on the rescored results of the speech recognition model; and
  
  adjusting the periodic time based, at least in part, on determined changes in sampled noise information.
- View Dependent Claims (16)
- - 16. The computer readable storage medium of claim 15, wherein the method further comprises:
    - repeatedly determining parameters of the transducer model at a periodic time.

17. A method of dynamic re-configurable speech recognition comprising:
- determining an identity of a speaker based, at least in part, on a user identifier;
  
  repeatedly determining parameters of a background model based, at least in part, on first sampled information collected at first periodic time intervals during a received voice request;
  
  repeatedly determining parameters of a transducer model based, at least in part, on second sampled information collected at second periodic time intervals during a received voice request;
  
  determining a speech recognition model based on user-specific transformations corresponding to the determined identity of the speaker and on at least one of the background model or the transducer model;
  
  applying one of a plurality of language models to the received voice request for speech recognition based on a data field selected by the speaker;
  
  re-scoring automatic speech recognition using the speech recognition model, comprising;
  
  generating word lattices representative of speech utterances in the received voice request,concatenating the word lattices into a single concatenated lattice, andapplying at least one language model to the single concatenated lattice in order to determine word lattice inter-relationships; and
  
  determining information in the received voice request based on the rescored results of the speech recognition model.
- View Dependent Claims (18, 19, 20, 21, 22)
- - 18. The method of claim 17, further comprising:
    - adjusting a length of the first periodic time intervals based, at least in part, on the collected first sampled information.
  - 19. The method of claim 17, further comprising:
    - adjusting a length of the first periodic time intervals based, at least in part, on a frequency or a magnitude of determined changes in successively sampled ones of the first sampled information.
  - 20. The method of claim 17, further comprising:
    - generating a confidence score after applying the speech recognition model to determine whether the generated word lattices are acceptable;
      
      comparing the confidence score to a predetermined value; and
      
      repeating automatic speech recognition of the received voice request based, at least in part, on a result of the comparing of the confidence score with the predetermined value.
  - 21. The method of claim 17, wherein the user identifier comprises a calling phone number.
  - 22. The system of claim 17, wherein the user identifier is based on a plurality of rules associated with a telephone and at least one of a time of day or a day of a week.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
AT&T Corporation (AT&T, Inc.)
Inventors
Gajic, Bojana, Narayanan, Shrikanth Sambasivan, Parthasarathy, Sarangarajan, Rose, Richard Cameron, Rosenberg, Aaron Edward
Primary Examiner(s)
Smits, Talivaldis Ivars
Assistant Examiner(s)
Pierre, Myriam

Application Number

US10/091,689
Time in Patent Office

1,875 Days
Field of Search

704/234, 704/235, 704/236, 704/231, 704/200, 704/270
US Class Current

704/231
CPC Class Codes

G10L 15/07 to the speaker

G10L 15/20 Speech recognition techniqu...

Systems and methods for dynamic re-configurable speech recognition

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

92 Citations

22 Claims

Specification

Solutions

Use Cases

Quick Links

Systems and methods for dynamic re-configurable speech recognition

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

92 Citations

22 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links