Method and system for generating squeezed acoustic models for specialized speech recognizer
First Claim
1. A computer-based method of automatically generating, from a first speech recognizer, a second speech recognizer, wherein the first speech recognizer includes a set of states and a set of probability density functions assembling output probabilities for an observation of a speech frame in the states, the method comprising the steps of:
- generating, from the set of states of the first speech recognizer, a set of states of the second speech recognizer by selecting a subset of states of the first speech recognizer being distinctive of a particular application; and
generating, from the set of probability density functions of the first speech recognizer, a set of probability density functions of the second speech recognizer by selecting a subset of probability density functions of the first speech recognizer being distinctive of the particular application, such that the second speech recognizer is at least one of tailored to the particular application and requires reduced resources compared to the first speech recognizer.
1 Assignment
0 Petitions
Accused Products
Abstract
Computer-based methods and systems are provided for automatically generating, from a first speech recognizer, a second speech recognizer such that the second speech recognizer is tailored to a certain application and requires reduced resources compared to the first speech recognizer. The invention exploits the first speech recognizer'"'"'s set of states si and set of probability density functions (pdfs) assembling output probabilities for an observation of a speech frame in said states si. The invention teaches a first step of generating a set of states of the second speech recognizer reduced to a subset of states of the first speech recognizer being distinctive of the certain application. The invention teaches a second step of generating a set of probability density functions of the second speech recognizer reduced to a subset of probability density functions of the first speech recognizer being distinctive of the certain application.
33 Citations
23 Claims
-
1. A computer-based method of automatically generating, from a first speech recognizer, a second speech recognizer, wherein the first speech recognizer includes a set of states and a set of probability density functions assembling output probabilities for an observation of a speech frame in the states, the method comprising the steps of:
-
generating, from the set of states of the first speech recognizer, a set of states of the second speech recognizer by selecting a subset of states of the first speech recognizer being distinctive of a particular application; and
generating, from the set of probability density functions of the first speech recognizer, a set of probability density functions of the second speech recognizer by selecting a subset of probability density functions of the first speech recognizer being distinctive of the particular application, such that the second speech recognizer is at least one of tailored to the particular application and requires reduced resources compared to the first speech recognizer. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
determining at least one of resource requirements and recognition accuracy of the second speech recognizer; and
repeating the state set generation, probability density function set generation, and acoustic model parameter generation steps, with one of more limiting and less limiting selection criteria, when at least one of the resource requirements and the recognition accuracy does not achieve at least one of a resource target and an accuracy target, respectively.
-
-
4. The method of claim 2, wherein selecting at least one of the subset of states and the subset of probability density functions of the first speech recognizer exploits phonetical knowledge of the particular application.
-
5. The method of claim 4, wherein selecting at least one of the subset of states and the subset of probability density functions of the first speech recognizer exploits application-specific training data.
-
6. The method of claim 5, wherein selecting the subset of states comprises associating a multitude of speech frames of the training data with the correct states of the first speech recognizer and selecting those states with a frequency of occurrence above a threshold as the subset of states.
-
7. The method of claim 5, wherein the set of probability density functions of the first speech recognizer assemble output probabilities as a weighted sum, and wherein selecting the subset of probability density functions comprises selecting those probability density functions contributing to the output probabilities with a weight above a threshold.
-
8. The method of claim 4, wherein selecting the subset of probability density functions comprises the steps of:
-
identifying, from the set of states of the first speech recognizer, a subset of more reliably classifiable states; and
selecting, from an output probability of a more reliably classifiable state, those probability density functions contributing to the output probability with a weight above a threshold.
-
-
9. The method of claim 8, wherein identifying a more reliably classifiable state comprises the steps of:
-
associating a multitude of speech frames of the training data with the correct states of the first speech recognizer; and
determining, for a certain state, a reliability value as the quotient of;
(i) the number of the speech frames for which the certain state is a correct state and for which an output probability for observation of a speech frame in the certain state is among the N highest output probabilities for observation of the speech frame in anyone of the states; and
(ii) the number of the speech frames for which the certain state is a correct state; and
identifying the certain state as a more reliably classifiable state, when the reliability value is above a threshold.
-
-
10. Apparatus for automatically generating, from a first speech recognizer, a second speech recognizer, wherein the first speech recognizer includes a set of states and a set of probability density functions assembling output probabilities for an observation of a speech frame in the states, the apparatus comprising:
at least one processor operative to;
(i) generate, from the set of states of the first speech recognizer, a set of states of the second speech recognizer by selecting a subset of states of the first speech recognizer being distinctive of a particular application; and
(ii) generate, from the set of probability density functions of the first speech recognizer, a set of probability density functions of the second speech recognizer by selecting a subset of probability density functions of the first speech recognizer being distinctive of the particular application, such that the second speech recognizer is at least one of tailored to the particular application and requires reduced resources compared to the first speech recognizer.- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
19. An article of manufacture for automatically generating, from a first speech recognizer, a second speech recognizer, wherein the first speech recognizer includes a set of states and a set of probability density functions assembling output probabilities for an observation of a speech frame in the states, comprising a machine readable medium containing one or more programs which when executed implement the steps of:
-
generating, from the set of states of the first speech recognizer, a set of states of the second speech recognizer by selecting a subset of states of the first speech recognizer being distinctive of a particular application; and
generating, from the set of probability density functions of the first speech recognizer, a set of probability density functions of the second speech recognizer by selecting a subset of probability density functions of the first speech recognizer being distinctive of the particular application, such that the second speech recognizer is at least one of tailored to the particular application and requires reduced resources compared to the first speech recognizer. - View Dependent Claims (20)
-
-
21. A computer-based method of automatically generating from a first speech recognizer, a second speech recognizer, wherein the first speech recognizer includes a set of states and a set of probability density functions assembling output probabilities for an observation of a speech frame in the states, the method comprising the steps of:
-
generating, from the set of states of the first speech recognizer, a set of states of the second speech recognizer by selecting a subset of states of the first speech recognizer being distinctive of a particular application; and
generating, from the set of probability density functions of the first speech recognizer, a set of probability density functions of the second speech recognizer by selecting a subset of probability density functions of the first speech recognizer being distinctive of the particular application, such that the second speech recognizer is at least one of tailored to the particular application and requires reduced resources compared to the first speech recognizer;
wherein selecting the subset of probability density functions comprises the steps of;
identifying a more reliably classifiable state by associating a multitude of speech frames of application-specific training data with correct states of the first speech recognizer;
determining, for a certain state, a reliability value representing a reliability measurement for the certain state; and
identifying the certain state as a more reliably classifiable state when the reliability value is above a threshold.
-
-
22. Apparatus for automatically generating, from a first speech recognizer, a second speech recognizer, wherein the first speech recognizer includes a set of states and a set of probability density functions assembling output probabilities for an observation of a speech frame in the states, the apparatus comprising:
-
at least one processor operative to;
(i) generate, from the set of states of the first speech recognizer, a set of states of the second speech recognizer by selecting a subset of states of the first speech recognizer being distinctive of a particular application; and
(ii) generate, from the set of probability density functions of the first speech recognizer, a set of probability density functions of the second speech recognizer by selecting a subset of probability density functions of the first speech recognizer being distinctive of the particular application, such that the second speech recognizer is at least one of tailored to the particular application and requires reduced resources compared to the first speech recognizer;
wherein the operation of selecting the subset of probability density functions comprises;
(i) identifying a more reliably classifiable state by associating a multitude of speech frames of application-specific training data with correct states of the first speech recognizer;
(ii) determining, for a certain state, a reliability value representing a reliability measurement for the certain state; and
(iii) identifying the certain state as a more reliably classifiable state when the reliability value is above a threshold.
-
-
23. An article of manufacture for automatically generating, from a first speech recognizer, a second speech recognizer, wherein the first speech recognizer includes a set of states and a set of probability density functions assembling output probabilities for an observation of a speech frame in the states, comprising a machine readable medium containing one or more programs which when executed implement the steps of:
-
generating, from the set of states of the first speech recognizer, a set of states of the second speech recognizer by selecting a subset of states of the first speech recognizer being distinctive of a particular application; and
generating, from the set of probability density functions of the first speech recognizer, a set of probability density functions of the second speech recognizer by selecting a subset of probability density functions of the first speech recognizer being distinctive of the particular application, such that the second speech recognizer is at least one of tailored to the particular application and requires reduced resources compared to the first speech recognizer;
wherein selecting the subset of probability density functions comprises the steps of;
identifying a more reliably classifiable state by associating a multitude of speech frames of application-specific training data with correct states of the first speech recognizer;
determining, for a certain state, a reliability value representing a reliability measurement for the certain state; and
identifying the certain state as a more reliably classifiable state when the reliability value is above a threshold.
-
Specification