Method and system for generating squeezed acoustic models for specialized speech recognizer

US 6,789,061 B1
Filed: 08/14/2000
Issued: 09/07/2004
Est. Priority Date: 08/25/1999
Status: Expired due to Fees

First Claim

Patent Images

1. A computer-based method of automatically generating, from a first speech recognizer, a second speech recognizer, wherein the first speech recognizer includes a set of states and a set of probability density functions assembling output probabilities for an observation of a speech frame in the states, the method comprising the steps of:

generating, from the set of states of the first speech recognizer, a set of states of the second speech recognizer by selecting a subset of states of the first speech recognizer being distinctive of a particular application; and

generating, from the set of probability density functions of the first speech recognizer, a set of probability density functions of the second speech recognizer by selecting a subset of probability density functions of the first speech recognizer being distinctive of the particular application, such that the second speech recognizer is at least one of tailored to the particular application and requires reduced resources compared to the first speech recognizer.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Computer-based methods and systems are provided for automatically generating, from a first speech recognizer, a second speech recognizer such that the second speech recognizer is tailored to a certain application and requires reduced resources compared to the first speech recognizer. The invention exploits the first speech recognizer'"'"'s set of states s_iand set of probability density functions (pdfs) assembling output probabilities for an observation of a speech frame in said states s_i. The invention teaches a first step of generating a set of states of the second speech recognizer reduced to a subset of states of the first speech recognizer being distinctive of the certain application. The invention teaches a second step of generating a set of probability density functions of the second speech recognizer reduced to a subset of probability density functions of the first speech recognizer being distinctive of the certain application.

33 Citations

View as Search Results

23 Claims

1. A computer-based method of automatically generating, from a first speech recognizer, a second speech recognizer, wherein the first speech recognizer includes a set of states and a set of probability density functions assembling output probabilities for an observation of a speech frame in the states, the method comprising the steps of:
- generating, from the set of states of the first speech recognizer, a set of states of the second speech recognizer by selecting a subset of states of the first speech recognizer being distinctive of a particular application; and
  
  generating, from the set of probability density functions of the first speech recognizer, a set of probability density functions of the second speech recognizer by selecting a subset of probability density functions of the first speech recognizer being distinctive of the particular application, such that the second speech recognizer is at least one of tailored to the particular application and requires reduced resources compared to the first speech recognizer.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, further comprising the step of generating acoustic model parameters of the second speech recognizer by reestimating acoustic model parameters of the first speech recognizer based on the set of states of the second speech recognizer and the set of probability density functions of the second speech recognizer.
  - 3. The method of claim 2, further comprising the steps of:
4. The method of claim 2, wherein selecting at least one of the subset of states and the subset of probability density functions of the first speech recognizer exploits phonetical knowledge of the particular application.
5. The method of claim 4, wherein selecting at least one of the subset of states and the subset of probability density functions of the first speech recognizer exploits application-specific training data.
6. The method of claim 5, wherein selecting the subset of states comprises associating a multitude of speech frames of the training data with the correct states of the first speech recognizer and selecting those states with a frequency of occurrence above a threshold as the subset of states.
7. The method of claim 5, wherein the set of probability density functions of the first speech recognizer assemble output probabilities as a weighted sum, and wherein selecting the subset of probability density functions comprises selecting those probability density functions contributing to the output probabilities with a weight above a threshold.
8. The method of claim 4, wherein selecting the subset of probability density functions comprises the steps of:
- identifying, from the set of states of the first speech recognizer, a subset of more reliably classifiable states; and
  
  selecting, from an output probability of a more reliably classifiable state, those probability density functions contributing to the output probability with a weight above a threshold.
9. The method of claim 8, wherein identifying a more reliably classifiable state comprises the steps of:
- associating a multitude of speech frames of the training data with the correct states of the first speech recognizer; and
  
  determining, for a certain state, a reliability value as the quotient of;
  
  (i) the number of the speech frames for which the certain state is a correct state and for which an output probability for observation of a speech frame in the certain state is among the N highest output probabilities for observation of the speech frame in anyone of the states; and
  
  (ii) the number of the speech frames for which the certain state is a correct state; and
  
  identifying the certain state as a more reliably classifiable state, when the reliability value is above a threshold.

10. Apparatus for automatically generating, from a first speech recognizer, a second speech recognizer, wherein the first speech recognizer includes a set of states and a set of probability density functions assembling output probabilities for an observation of a speech frame in the states, the apparatus comprising:
- at least one processor operative to;
  
  (i) generate, from the set of states of the first speech recognizer, a set of states of the second speech recognizer by selecting a subset of states of the first speech recognizer being distinctive of a particular application; and
  
  (ii) generate, from the set of probability density functions of the first speech recognizer, a set of probability density functions of the second speech recognizer by selecting a subset of probability density functions of the first speech recognizer being distinctive of the particular application, such that the second speech recognizer is at least one of tailored to the particular application and requires reduced resources compared to the first speech recognizer.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
- - 11. The apparatus of claim 10, wherein the at least one processor is further operative to generate acoustic model parameters of the second speech recognizer by reestimating acoustic model parameters of the first speech recognizer based on the set of states of the second speech recognizer and the set of probability density functions of the second speech recognizer.
  - 12. The apparatus of claim 11, wherein the at least one processor is further operative to:
    - (i) determine at least one of resource requirements and recognition accuracy of the second speech recognizer; and
      
      (ii) repeat the state set generation, probability density function set generation, and acoustic model parameter generation operations, with one of more limiting and less limiting selection criteria, when at least one of the resource requirements and the recognition accuracy does not achieve at least one of a resource target and an accuracy target, respectively.
  - 13. The apparatus of claim 11, wherein the operation of selecting at least one of the subset of states and the subset of probability density functions of the first speech recognizer exploits phonetical knowledge of the particular application.
  - 14. The apparatus of claim 13, wherein the operation of selecting at least one of the subset of states and the subset of probability density functions of the first speech recognizer exploits application-specific training data.
  - 15. The apparatus of claim 14, wherein the operation of selecting the subset of states comprises associating a multitude of speech frames of the training data with the correct states of the first speech recognizer and selecting those states with a frequency of occurrence above a threshold as the subset of states.
  - 16. The apparatus of claim 14, wherein the set of probability density functions of the first speech recognizer assemble output probabilities as a weighted sum, and wherein selecting the subset of probability density functions comprises selecting those probability density functions contributing to the output probabilities with a weight above a threshold.
  - 17. The apparatus of claim 13, wherein the operation of selecting the subset of probability density functions comprises:
    - (i) identifying, from the set of states of the first speech recognizer, a subset of more reliably classifiable states; and
      
      (ii) selecting, from an output probability of a more reliably classifiable state, those probability density functions contributing to the output probability with a weight above a threshold.
  - 18. The apparatus of claim 17, wherein the operation of identifying a more reliably classifiable state comprises:
    - (i) associating a multitude of speech frames of the training data with the correct states of the first speech recognizer; and
      
      (ii) determining, for a certain state, a reliability value as the quotient of;
      
      (a) the number of the speech frames for which the certain state is a correct state and for which an output probability for observation of a speech frame in the certain state is among the N highest output probabilities for observation of the speech frame in anyone of the states; and
      
      (b) the number of the speech frames for which the certain state is a correct state; and
      
      (iii) identifying the certain state as a more reliably classifiable state, when the reliability value is above a threshold.

19. An article of manufacture for automatically generating, from a first speech recognizer, a second speech recognizer, wherein the first speech recognizer includes a set of states and a set of probability density functions assembling output probabilities for an observation of a speech frame in the states, comprising a machine readable medium containing one or more programs which when executed implement the steps of:
- generating, from the set of states of the first speech recognizer, a set of states of the second speech recognizer by selecting a subset of states of the first speech recognizer being distinctive of a particular application; and
  
  generating, from the set of probability density functions of the first speech recognizer, a set of probability density functions of the second speech recognizer by selecting a subset of probability density functions of the first speech recognizer being distinctive of the particular application, such that the second speech recognizer is at least one of tailored to the particular application and requires reduced resources compared to the first speech recognizer.
- View Dependent Claims (20)
- - 20. The article of claim 19, further implementing the step of generating acoustic model parameters of the second speech recognizer by reestimating acoustic model parameters of the first speech recognizer based on the set of states of the second speech recognizer and the set of probability density functions of the second speech recognizer.

21. A computer-based method of automatically generating from a first speech recognizer, a second speech recognizer, wherein the first speech recognizer includes a set of states and a set of probability density functions assembling output probabilities for an observation of a speech frame in the states, the method comprising the steps of:
- generating, from the set of states of the first speech recognizer, a set of states of the second speech recognizer by selecting a subset of states of the first speech recognizer being distinctive of a particular application; and
  
  generating, from the set of probability density functions of the first speech recognizer, a set of probability density functions of the second speech recognizer by selecting a subset of probability density functions of the first speech recognizer being distinctive of the particular application, such that the second speech recognizer is at least one of tailored to the particular application and requires reduced resources compared to the first speech recognizer;
  
  wherein selecting the subset of probability density functions comprises the steps of;
  
  identifying a more reliably classifiable state by associating a multitude of speech frames of application-specific training data with correct states of the first speech recognizer;
  
  determining, for a certain state, a reliability value representing a reliability measurement for the certain state; and
  
  identifying the certain state as a more reliably classifiable state when the reliability value is above a threshold.

22. Apparatus for automatically generating, from a first speech recognizer, a second speech recognizer, wherein the first speech recognizer includes a set of states and a set of probability density functions assembling output probabilities for an observation of a speech frame in the states, the apparatus comprising:
- at least one processor operative to;
  
  (i) generate, from the set of states of the first speech recognizer, a set of states of the second speech recognizer by selecting a subset of states of the first speech recognizer being distinctive of a particular application; and
  
  (ii) generate, from the set of probability density functions of the first speech recognizer, a set of probability density functions of the second speech recognizer by selecting a subset of probability density functions of the first speech recognizer being distinctive of the particular application, such that the second speech recognizer is at least one of tailored to the particular application and requires reduced resources compared to the first speech recognizer;
  
  wherein the operation of selecting the subset of probability density functions comprises;
  
  (i) identifying a more reliably classifiable state by associating a multitude of speech frames of application-specific training data with correct states of the first speech recognizer;
  
  (ii) determining, for a certain state, a reliability value representing a reliability measurement for the certain state; and
  
  (iii) identifying the certain state as a more reliably classifiable state when the reliability value is above a threshold.

23. An article of manufacture for automatically generating, from a first speech recognizer, a second speech recognizer, wherein the first speech recognizer includes a set of states and a set of probability density functions assembling output probabilities for an observation of a speech frame in the states, comprising a machine readable medium containing one or more programs which when executed implement the steps of:
- generating, from the set of states of the first speech recognizer, a set of states of the second speech recognizer by selecting a subset of states of the first speech recognizer being distinctive of a particular application; and
  
  generating, from the set of probability density functions of the first speech recognizer, a set of probability density functions of the second speech recognizer by selecting a subset of probability density functions of the first speech recognizer being distinctive of the particular application, such that the second speech recognizer is at least one of tailored to the particular application and requires reduced resources compared to the first speech recognizer;
  
  wherein selecting the subset of probability density functions comprises the steps of;
  
  identifying a more reliably classifiable state by associating a multitude of speech frames of application-specific training data with correct states of the first speech recognizer;
  
  determining, for a certain state, a reliability value representing a reliability measurement for the certain state; and
  
  identifying the certain state as a more reliably classifiable state when the reliability value is above a threshold.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Waast-Ricard, Claire, Fischer, Volker, Kunzmann, Siegfried
Primary Examiner(s)
Dorvil, Richemond
Assistant Examiner(s)
Han, Qi

Application Number

US09/638,160
Time in Patent Office

1,485 Days
Field of Search

704/240, 704/231, 704/247, 704/251, 704/255, 704/256
US Class Current

704/240
CPC Class Codes

G10L 15/144 Training of HMMs

Method and system for generating squeezed acoustic models for specialized speech recognizer

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

33 Citations

23 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for generating squeezed acoustic models for specialized speech recognizer

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

33 Citations

23 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links