Systems and methods for dynamic re-configurable speech recognition
First Claim
1. A method of dynamic re-configurable speech recognition comprising:
- determining an identity of a speaker based, at least in part, on a user identifier;
repeatedly determining parameters of a background model based on sampled information collected at a periodic time interval during a received voice request;
determining parameters of a transducer model;
adapting a speech recognition model based on user-specific transformations corresponding to the determined identity of the speaker and on at least one of the background model or the transducer model;
applying one of a plurality of language models to the received voice request for speech recognition based on a data field selected by the speaker;
re-scoring automatic speech recognition using the speech recognition model comprising;
generating word lattices representative of speech utterances in the received voice request,concatenating the word lattices into a single concatenated lattice,applying at least one language model to the single concatenated lattice in order to determine word lattice inter-relationships;
determining information in the received voice request based on the re-scored results of the speech recognition model; and
adjusting the periodic time interval based, at least in part, on determined changes in the sampled information.
5 Assignments
0 Petitions
Accused Products
Abstract
Speech recognition models are dynamically re-configurable based on user information, application information, background information such as background noise and transducer information such as transducer response characteristics to provide users with alternate input modes to keyboard text entry. Word recognition lattices are generated for each data field of an application and dynamically concatenated into a single word recognition lattice. A language model is applied to the concatenated word recognition lattice to determine the relationships between the word recognition lattices and repeated until the generated word recognition lattices are acceptable or differ from a predetermined value only by a threshold amount. These techniques of dynamic re-configurable speech recognition provide for deployment of speech recognition on small devices such as mobile phones and personal digital assistants as well environments such as office, home or vehicle while maintaining the accuracy of the speech recognition.
92 Citations
22 Claims
-
1. A method of dynamic re-configurable speech recognition comprising:
-
determining an identity of a speaker based, at least in part, on a user identifier; repeatedly determining parameters of a background model based on sampled information collected at a periodic time interval during a received voice request; determining parameters of a transducer model; adapting a speech recognition model based on user-specific transformations corresponding to the determined identity of the speaker and on at least one of the background model or the transducer model; applying one of a plurality of language models to the received voice request for speech recognition based on a data field selected by the speaker; re-scoring automatic speech recognition using the speech recognition model comprising; generating word lattices representative of speech utterances in the received voice request, concatenating the word lattices into a single concatenated lattice, applying at least one language model to the single concatenated lattice in order to determine word lattice inter-relationships; determining information in the received voice request based on the re-scored results of the speech recognition model; and adjusting the periodic time interval based, at least in part, on determined changes in the sampled information. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system for dynamic re-configurable speech recognition comprising:
-
a background model estimation circuit for repeatedly determining a background model at a periodic time interval during a voice request based, at least in part, on estimated background parameters based on collected sampled information; a transducer model estimation circuit for determining a transducer model of the voice request based, at least in part, on estimated transducer parameters; a background model adaptation circuit and a transducer model adaptation circuit for determining an adapted speech recognition model based on a speech recognition model and at least one of the background model or the transducer model; a speech recognition circuit for recognizing speech and generating a speech lattice for each of a plurality of data fields for which a user provides voice input, the speech recognition circuit being arranged to use a different language model for each of the plurality of data fields; a lattice concatenation circuit that concatenates at least two speech lattices based on speech utterances in the received voice request into a single lattice; and a controller that applies at least one language model to the single concatenated lattice to determine relationships between the lattices, wherein the controller is adapted to adjust the periodic time interval based, at least in part, on changes in the collected sampled information, and the controller is adapted to determine an identity of a speaker based, at least in part on a user identifier and to apply user-specific transformations, corresponding to the identity of the speaker, to the speech recognition model. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A computer readable storage medium comprising:
computer-readable program code usable to program a computer to perform a method for dynamic re-configurable speech recognition, the method comprising; determining an identity of a speaker based, at least in part, on a user identifier; determining parameters of a background model at a periodic time during a voice request; determining parameters of a transducer model; adapting a speech recognition model based on user-specific transformations corresponding to the determined identity of the speaker and on at least one of the background model or the transducer model; applying one of a plurality of language models to the received voice request for speech recognition based on a data field selected by the speaker; re-scoring automatic speech recognition using the speech recognition model, comprising; generating word lattices representative of speech utterances in the received voice request, concatenating the word lattices into a single concatenated lattice, applying at least one language model to the single concatenated lattice in order to determine word lattice inter-relationships; determining information in the received voice request based on the rescored results of the speech recognition model; and adjusting the periodic time based, at least in part, on determined changes in sampled noise information. - View Dependent Claims (16)
-
17. A method of dynamic re-configurable speech recognition comprising:
-
determining an identity of a speaker based, at least in part, on a user identifier; repeatedly determining parameters of a background model based, at least in part, on first sampled information collected at first periodic time intervals during a received voice request; repeatedly determining parameters of a transducer model based, at least in part, on second sampled information collected at second periodic time intervals during a received voice request; determining a speech recognition model based on user-specific transformations corresponding to the determined identity of the speaker and on at least one of the background model or the transducer model; applying one of a plurality of language models to the received voice request for speech recognition based on a data field selected by the speaker; re-scoring automatic speech recognition using the speech recognition model, comprising; generating word lattices representative of speech utterances in the received voice request, concatenating the word lattices into a single concatenated lattice, and applying at least one language model to the single concatenated lattice in order to determine word lattice inter-relationships; and determining information in the received voice request based on the rescored results of the speech recognition model. - View Dependent Claims (18, 19, 20, 21, 22)
-
Specification