System and methods for acoustic and language modeling for automatic speech recognition with large vocabularies

US 20050143972A1
Filed: 02/24/2005
Published: 06/30/2005
Est. Priority Date: 03/17/1999
Status: Active Grant

First Claim

Patent Images

1. A method for providing speech recognition, the method comprising the steps of:

partitioning a language vocabulary V of word forms of into subsets of word forms based on frequencies of occurrence of the respective word forms;

in at least one of said subsets, splitting word forms having frequencies less than a threshold to thereby generate word form components; and

generating a language component vocabulary VC comprising the word forms and the word form components.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for generating a language component vocabulary VC for a speech recognition system having a language vocabulary V of a plurality of word forms is disclosed. The method includes: partitioning the language vocabulary V into subsets of word forms based on frequencies of occurrence of the respective word forms; and in at least one of the subsets, splitting word forms having frequencies less than a threshold to thereby generate word form components. Also disclosed is a method for use in speech recognition including: splitting an acoustic vocabulary comprising baseforms into baseform components and storing the baseform components; and, performing sound to spelling mapping on the baseform components so as to generate a baseform components to word parts table for use in subsequent decoding of speech. A method for decoding a speech utterance using language model components and acoustic components, includes the steps of: generating from the utterance a stack of baseform component paths; concatenating baseform components in a path to generate concatenated baseforms, when the concatenated baseform components correspond to a baseform found in an acoustic vocabulary; mapping the concatenated baseforms into words; computing language model (LM) scores associated with the words using a language model, and performing further decoding of the utterance based thereupon.

250 Citations

37 Claims

1. A method for providing speech recognition, the method comprising the steps of:
- partitioning a language vocabulary V of word forms of into subsets of word forms based on frequencies of occurrence of the respective word forms;
  
  in at least one of said subsets, splitting word forms having frequencies less than a threshold to thereby generate word form components; and
  
  generating a language component vocabulary VC comprising the word forms and the word form components.
- View Dependent Claims (6, 7, 9, 11, 12, 13, 19, 20, 21, 22, 23, 24, 26, 27, 28, 29, 35, 36)
- - 6. The method of claim 1, wherein said splitting is performed subject to a constraint in which a word that contains a given string of letters is prevented from being split within the string if the string of letters corresponds to one phoneme.
  - 7. The method of claim 1, wherein said splitting is performed using a fixed vocabulary and a fixed list of allowable endings, with each word from the fixed vocabulary being split into at least a stem and an ending that is an element of the fixed set of endings, so as to substantially minimize the total number of all stems that are required to split every word in the fixed vocabulary, wherein the fixed set of allowable endings includes an empty ending.
  - 9. The method of claim 1, further comprising generating and storing a word form to corresponding word form components table.
  - 11. The method of claim 1, further comprising the steps of:
    - generating a map of said word forms to said word form components, said map further including each of a plurality of non-split words as being associated with itself;
      
      filtering a textual corpus using the map to generate a textual component corpus containing the non-split word forms and the word form components of the map;
      
      accumulating the word form components and the non-split word forms generated by said filtering step in an n-gram language model; and
      
      determining counts of n-tuple sets of word form components and word forms to estimate n-gram probabilities for the n-gram language model.
  - 12. The method of claim 11, wherein said filtering step maps every word in the corpus into a n-tuple word form component.
  - 13. The method of claim 1, further comprising the steps of:
    - mapping the language vocabulary V into an acoustic vocabulary comprising baseforms;
      
      splitting the acoustic vocabulary into baseform components and storing said baseform components; and
      
      performing sound to spelling mapping on said baseform components so as to generate a baseform components to word parts table for use in subsequent decoding of speech.
  - 19. The method of claim 13, further comprising performing spelling to sound mapping which includes applying a predetermined set of rules to each word in a word string of a textual corpus, with pronunciations of words being obtained from a word to baseform table, wherein baseforms stored in said word to baseform table are collected in said acoustic vocabulary.
  - 20. The method of claim 19, further comprising making entries in said baseform components to word parts table by applying spelling to sound mapping to strings of components, said strings of components being obtained by filtering words of said textual corpus.
  - 21. The method of claim 19, further comprising applying said predetermined set of rules to a language model vocabulary so as to produce new word/baseform pairs in said word to baseform table.
  - 22. The method of claim 19 wherein said sound to spelling mapping is performed via an inversion of said set of rules.
  - 23. The method of claim 22 wherein said sound to spelling mapping produces said baseform components to word parts table by utilizing data from said word to baseforms table, the acoustic vocabulary and the stored baseform components.
  - 24. The method of claim 13, wherein said splitting of the acoustic vocabulary is performed subject to a constraint in which a word that contains a given string of letters is prevented from being split within the string if the string of letters corresponds to one phoneme.
  - 26. The method of claim 13, further comprising decoding a speech utterance using the language model components and acoustic components, wherein decoding comprises:
    - (a) generating from said utterance a stack of baseform component paths;
      
      (b) concatenating baseform components in a path to generate concatenated baseforms, when the concatenated baseform components correspond to a baseform found in the acoustic vocabulary;
      
      (c) mapping said concatenated baseforms into words;
      
      (d) computing language model (LM) scores associated with said words using a language model, and performing further decoding of said utterance based thereupon.
  - 27. The method of claim 26 wherein said step (d) includes:
    - mapping said words into a string of sub-words;
      
      computing said LM scores for strings of said sub-words; and
      
      attaching said LM scores to words that produced the corresponding strings of sub-words and performing said further decoding based thereupon.
  - 28. The method of claim 26, wherein said step (a) includes the sub-steps of producing, from said utterance, a set of baseform component strings, and generating said stack of baseform component paths from said strings.
  - 29. The method of claim 26, further comprising the steps of:
    - (e) mapping the baseform components of said path into words parts, when concatenated baseform components thereof do not form a baseform found in the acoustic vocabulary;
      
      (f) generating a LM score for an n-tuple of said word parts;
      
      (g) designating a concatenated word form as a valid word, if the LM score for the n-tuple of word parts exceeds a specific threshold, and adding the valid word to a word stack for further decoding.
  - 35. The method of claim 26, further comprising splitting said words via linguist splitting based on morphemes.
  - 36. The method of claim 26, further comprising splitting said words via linguistic splitting based on any one of spelling, phones and morphemes.

2-5. -5. (canceled)

8. (canceled)

10. (canceled)

14-18. -18. (canceled)

25. (canceled)

30-34. -34. (canceled)

37-47. -47. (canceled)

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Kanevsky, Dimitri, Gopalakrishnan, Ponani, Monkowski, Michael Daniel, Sedivy, Jan

Granted Patent

US 7,801,727 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/10
CPC Class Codes

G06F 40/237   Lexical tools

G10L 15/183   using context dependencies,...

G10L 15/197   Probabilistic grammars, e.g...

Y10S 707/99942   Manipulating data structure...

System and methods for acoustic and language modeling for automatic speech recognition with large vocabularies

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

250 Citations

37 Claims

Specification

Solutions

Use Cases

Quick Links

System and methods for acoustic and language modeling for automatic speech recognition with large vocabularies

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

250 Citations

37 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links