Method and apparatus for phonetic context adaptation for improved speech recognition

US 6,999,925 B2
Filed: 11/13/2001
Issued: 02/14/2006
Est. Priority Date: 11/14/2000
Status: Expired due to Term

- Alert
- Pin

First Claim

Patent Images

1. A computerized method of automatically generating from a first speech recognizer a second speech recognizer, said first speech recognizer comprising a first acoustic model with a first decision network and corresponding first phonetic contexts, and said second speech recognizer being adapted to a specific domain, said method comprising:

based on said first acoustic model, generating a second acoustic model with a second decision network and corresponding second phonetic contexts for said second speech recognizer by re-estimating said first decision network and said corresponding first phonetic contexts based on domain-specific training data, wherein said first decision network and said second decision network utilize a phonetic decision free to perform speech recognition operations, wherein the number of nodes in the second decision network is not fixed by the number of nodes in the first decision network, and wherein said re-estimating comprises partitioning said training data using said first decision network of said first speech recognizer.

View all claims

4 Assignments

Timeline View

Assignment View

Litigations

0 Petitions

Accused Products

Abstract

The present invention provides a computerized method and apparatus for automatically generating from a first speech recognizer a second speech recognizer which can be adapted to a specific domain. The first speech recognizer can include a first acoustic model with a first decision network and corresponding first phonetic contexts. The first acoustic model can be used as a starting point for the adaptation process. A second acoustic model with a second decision network and corresponding second phonetic contexts for the second speech recognizer can be generated by re-estimating the first decision network and the corresponding first phonetic contexts based on domain-specific training data.

257 Citations

29 Claims

1. A computerized method of automatically generating from a first speech recognizer a second speech recognizer, said first speech recognizer comprising a first acoustic model with a first decision network and corresponding first phonetic contexts, and said second speech recognizer being adapted to a specific domain, said method comprising:
- based on said first acoustic model, generating a second acoustic model with a second decision network and corresponding second phonetic contexts for said second speech recognizer by re-estimating said first decision network and said corresponding first phonetic contexts based on domain-specific training data, wherein said first decision network and said second decision network utilize a phonetic decision free to perform speech recognition operations, wherein the number of nodes in the second decision network is not fixed by the number of nodes in the first decision network, and wherein said re-estimating comprises partitioning said training data using said first decision network of said first speech recognizer.
- View Dependent Claims (3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 3. The method of claim 1, said partitioning stop comprising:
    - passing feature vectors of said training data through said first decision network and extracting and classifying phonetic contexts of said training data.
  - 4. The method of claim 3, said re-estimating further comprising:
    - detecting domain-specific phonetic contexts by executing a split-and-merge methodology based on said partitioned training data for re-estimating said first decision network and said first phonetic contexts.
  - 5. The method of claim 4, wherein control parameters of said split-and-merge methodology are chosen specific to said domain.
  - 6. The method of claim 4, wherein for Hidden-Markov-Models (HMMs) associated with leaf nodes of said second decision network, said re-estimating comprises re-adjusting HMM parameters corresponding to said HMMs.
  - 7. The method of claim 6, wherein said HMMs comprise a set of states and a set of probability-density-functions (PDFS) assembling output probabilities for an observation of a speech frame in said states, and wherein said re-adjusting step is preceded by:
    - selecting from said states a subset of states being distinctive of said domain; and
      
      selecting from said set of PDFS a subset of PDFS being distinctive of said domain.
  - 8. The method of claim 6, wherein said method is executed iteratively for additional training data.
  - 9. The method of claim 7, wherein said method is executed iteratively for additional training data.
  - 10. The method of claim 6, wherein said first speech recognizer is a general purpose speech recognizer, and wherein the second speech recognizer is a speaker independent speech recognizer.
  - 11. The method of claim 6, wherein said first and said second speech recognizers are speaker-dependent speech recognizers and said training data is additional speaker-dependent training data.
  - 12. The method of claim 6, wherein said first speech recognizer is a speech recognizer of at least a first language and said domain specific training data relates to a second language and said second speech recognizer is a multi-lingual speech recognizer of said second language and said at least first language.
  - 13. The method of claim 1, wherein said domain is selected from the group consisting of a language, a set of languages, a dialect, a task area, and a set of task areas.

2. A computerized method of automatically generating from a first speech recognizer a second speech recognizer, said first speech recognizer comprising a first acoustic model wit a first decision network and corresponding first phonetic contexts, and said second speech recognizer being adapted to a specific domain, said method comprising:
- based on said first acoustic model, generating a second acoustic model with a second decision network and corresponding second phonetic contexts for said second speech recognizer by re-estimating said first decision network and said corresponding first phonetic contexts based on domain-specific training data, wherein said first decision network and said second decision network utilize a phonetic decision tree to perform speech recognition operations, wherein the number of nodes in the second decision network is not fixed by the number of nodes in the first decision network, wherein said domain-specific training data is of a limited amount, and wherein the generating step further comprises the steps of;
  
  identifying at least one acoustic context from the domain-specific training data; and
  
  adding a node to the second decision network for the identified context independent of other generating step operations.

14. A machine-readable storage, having stored thereon a computer program having a plurality of code sections executable by a machine for causing the machine to automatically generate from a first speech recognizer a second speech recognizer, said first speech recognizer comprising a first acoustic model with a first decision network and corresponding first phonetic contexts, and said second speech recognizer being adapted to a specific domain, said machine-readable storage causing the machine to perform the steps of:
- based on said first acoustic model, generating a second acoustic model with a second decision network and corresponding second phonetic contexts for said second speech recognizer by re-estimating said first decision network and said corresponding first phonetic contexts based on domain-specific training data, wherein said first decision network and said second decision network utilize a phonetic decision tree to perform speech recognition operations, wherein the number of nodes in the second decision network is not fixed by the number of nodes in the first decision network, and wherein said re-estimating comprises partitioning said training data using said first decision network of said first speech recognizer.
- View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
- - 16. The machine-readable storage of claim 14, said partitioning step comprising:
    - passing feature vectors of said training data through said first decision network and extracting and classifying phonetic contexts of said training data.
  - 17. The machine-readable storage of claim 16, said re-estimating further comprising:
    - detecting domain-specific phonetic contexts by executing a split-and-merge methodology based on said partitioned training data for re-estimating said first decision network and said first phonetic contexts.
  - 18. The machine-readable storage of claim 17, wherein control parameters of said split-and-merge methodology are chosen specific to said domain.
  - 19. The machine-readable storage of claim 17, wherein for Hidden-Markov-Models (HMMs) associated with leaf nodes of said second decision network, said re-estimating comprises re-adjusting HMM parameters corresponding to said HMMs.
  - 20. The machine-readable storage of claim 19, wherein said HMMs comprise a set of states and a set of probability-density-functions PDFS) assembling output probabilities for an observation of a speech frame in said states , and wherein said re-adjusting step is preceded by:
    - selecting from said states a subset of states being distinctive of said domain; and
      
      selecting from said set of PDFS a subset of PDFS being distinctive of said domain.
  - 21. The machine-readable storage of claim 19, wherein said method is executed iteratively for additional training data.
  - 22. The machine-readable storage of claim 20, wherein said method is executed iteratively for additional training data.
  - 23. The machine-readable storage of claim 19, wherein said first speech recognizer is a general purpose speech recognizer, and wherein the second speech recognizer is a speaker independent speech recognizer.
  - 24. The machine-readable storage of claim 19, wherein said first and said second speech recognizers are speaker-dependent speech recognizers and said training data is additional speaker-dependent training data.
  - 25. The machine-readable storage of claim 19, wherein said first speech recognizer is a speech recognizer of at least a first language and said domain specific training data relates to a second language and said second speech recognizer is a multi-lingual speech recognizer of said second language and said at least first language.
  - 26. The machine-readable storage of claim 14, wherein said domain is selected from the group consisting of a language, a set of languages, a dialect, a task area, and a set of task areas.

15. A machine-readable storage, having stored thereon a computer program having a plurality of code sections executable by a machine for causing the machine to automatically generate from a first speech recognizer a second speech recognizer, said first speech recognizer comprising a first acoustic model with a first decision network and corresponding first phonetic contexts, and said second speech recognizer being adapted to a specific domain, said machine-readable storage causing the machine to perform the steps of:
- based on said first acoustic model, generating a second acoustic model with a second decision network and corresponding second phonetic contexts for said second speech recognizer by re-estimating said first decision network and said corresponding first phonetic contexts based on domain-specific training data, wherein said first decision network and said second decision network utilize a phonetic decision tree to perform speech recognition operations, wherein the number of nodes in the second decision network is not fixed by the number of nodes in the first decision network, wherein said domain-specific training data is of a limited amount, and wherein the generating step further comprises the steps of;
  
  identifying at least one acoustic context from the domain-specific training data; and
  
  adding a node to the second decision network for the identified context independent of other generating step operations.

27. A computerized method of generating a second speech recognizer comprising the steps of:
- identifying a first speech recognizer of a first domain comprising a first acoustic model with a first decision network and corresponding first phonetic contexts;
  
  receiving domain-specific training data of a second domain; and
  
  based on the first speech recognizer and the domain-specific training data, generating a second acoustic model of said first domain and said second domain comprising a second acoustic model with a second decision network and corresponding second phonetic contexts, wherein the first domain comprises at least a first language, wherein the second domain comprises at least a second language, and wherein the second speech recognizer is a multi-lingual speech recognizer.
- View Dependent Claims (28, 29)
- - 28. The computerized method of claim 27, wherein the first domain is a general purpose domain, and wherein the second domain comprises at least one dialect.
  - 29. The computerized method of claim 27, wherein the first domain is a general purpose domain, and wherein the second domain comprises at least one task area.

Specification

Resources

Litigation Campaign Assessment

Litigation Data

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
International Business Machines Corporation
Inventors
Fischer, Volker, Kunzmann, Siegfried, Tyrrell, A. Jon, Janke, Eric-W.
Primary Examiner(s)
Young, W. R.
Assistant Examiner(s)
ALBERTALLI, BRIAN LOUIS

Application Number

US10/007,990
Publication Number

US 20020087314A1
Time in Patent Office

1,554 Days
Field of Search

704/244, 704/257, 704/10, 704/8
US Class Current

704/243
CPC Class Codes

G10L 15/07 to the speaker

Method and apparatus for phonetic context adaptation for improved speech recognition

First Claim

4 Assignments

Litigations

0 Petitions

Accused Products

Abstract

257 Citations

29 Claims

Specification

Use Cases

Quick Links

Others

Method and apparatus for phonetic context adaptation for improved speech recognition

First Claim

4 Assignments

Subscription Required

Subscription Required

Litigations

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

257 Citations

29 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others