System and methods for acoustic and language modeling for automatic speech recognition with large vocabularies

US 7,801,727 B2
Filed: 02/24/2005
Issued: 09/21/2010
Est. Priority Date: 03/17/1999
Status: Expired due to Fees

First Claim

Patent Images

1. A method of analyzing a language for providing speech recognition, the method comprising steps of:

determining a threshold frequency of occurrence, within a corpus, of word forms in a vocabulary V for the language, by using at least one processor;

in response to determining that a subset of the word forms has a frequency of occurrence in the corpus less than the threshold frequency, splitting at least some of the word forms in the subset to generate word form components, at least some of the word form components not being full words;

generating a language component vocabulary VC comprising the word forms in the vocabulary V and the word form components; and

generating and storing information indicating a correspondence between the word forms in the vocabulary V and corresponding word form components.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for generating a language component vocabulary VC for a speech recognition system having a language vocabulary V of a plurality of word forms is disclosed. The method includes: partitioning the language vocabulary V into subsets of word forms based on frequencies of occurrence of the respective word forms; and in at least one of the subsets, splitting word forms having frequencies less than a threshold to thereby generate word form components. Also disclosed is a method for use in speech recognition including: splitting an acoustic vocabulary comprising baseforms into baseform components and storing the baseform components; and, performing sound to spelling mapping on the baseform components so as to generate a baseform components to word parts table for use in subsequent decoding of speech. A method for decoding a speech utterance using language model components and acoustic components, includes the steps of: generating from the utterance a stack of baseform component paths; concatenating baseform components in a path to generate concatenated baseforms, when the concatenated baseform components correspond to a baseform found in an acoustic vocabulary; mapping the concatenated baseforms into words; computing language model (LM) scores associated with the words using a language model, and performing further decoding of the utterance based thereupon.

17 Citations

View as Search Results

20 Claims

1. A method of analyzing a language for providing speech recognition, the method comprising steps of:
- determining a threshold frequency of occurrence, within a corpus, of word forms in a vocabulary V for the language, by using at least one processor;
  
  in response to determining that a subset of the word forms has a frequency of occurrence in the corpus less than the threshold frequency, splitting at least some of the word forms in the subset to generate word form components, at least some of the word form components not being full words;
  
  generating a language component vocabulary VC comprising the word forms in the vocabulary V and the word form components; and
  
  generating and storing information indicating a correspondence between the word forms in the vocabulary V and corresponding word form components.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The method of claim 1, wherein said splitting is performed subject to a constraint in which a word that contains a given string of letters is prevented from being split within the string if the string of letters corresponds to one phoneme.
  - 3. The method of claim 1, wherein said splitting is performed using a fixed vocabulary and a fixed list of allowable endings, with each word from the fixed vocabulary being split into at least a stem and an ending that is an element of the fixed set of endings, so as to substantially minimize the total number of all stems that are required to split every word in the fixed vocabulary, wherein the fixed set of allowable endings includes an empty ending.
  - 4. The method of claim 1, wherein the information indicating a correspondence between word forms and corresponding word form components provides a map of the word forms to the word form components, the map further including a plurality of non-split words each being associated with itself, and wherein the method further comprises steps of:
    - filtering a textual corpus using the map to generate a textual component corpus containing the non-split word forms and the word form components of the map;
      
      accumulating the word form components and the non-split word forms generated by said filtering step in an n-gram language model; and
      
      determining counts of n-tuple sets of word form components and word forms to estimate n-gram probabilities for the n-gram language model.
  - 5. The method of claim 4, wherein said filtering step maps every word in the corpus into a n-tuple word form component.

6. A method for providing speech recognition, the method comprising steps of:
- determining a threshold frequency of occurrence, within a corpus, of word forms in a language vocabulary V, by using at least one processor;
  
  in response to determining that a subset of the word forms has a frequency of occurrence less than the threshold frequency, splitting at least a portion of the word forms in the subset to generate word form components, at least some of the word form components not being full words;
  
  generating a language component vocabulary VC comprising the word forms in the language vocabulary V and the word form components;
  
  mapping the language vocabulary V into an acoustic vocabulary comprising baseforms;
  
  splitting the acoustic vocabulary into baseform components and storing said baseform components; and
  
  performing sound to spelling mapping on said baseform components so as to generate information indicating a correspondence between baseform components and word parts for use in subsequent decoding of speech.
- View Dependent Claims (7, 8, 9, 10, 11)
- - 7. The method of claim 6, further comprising performing spelling to sound mapping which includes applying a predetermined set of rules to each word in a word string of a textual corpus, with pronunciations of words being obtained from the information indicating a correspondence between baseform components and word parts, wherein baseforms reflected in the information indicating a correspondence between baseform components and word parts are collected in said acoustic vocabulary.
  - 8. The method of claim 7, wherein the information indicating a correspondence between baseform components and word parts applies spelling to sound mapping to strings of components, said strings of components being obtained by filtering words of said textual corpus.
  - 9. The method of claim 7, further comprising applying said predetermined set of rules to a language model vocabulary so as to produce new word/baseform pairs in said information indicating a correspondence between baseform components and word parts.
  - 10. The method of claim 6, further comprising decoding a speech utterance using the language model components and acoustic components, wherein decoding comprises:
    - (a) generating from said utterance a stack of baseform component paths;
      
      (b) concatenating baseform components in a path to generate concatenated baseforms, when the concatenated baseform components correspond to a baseform found in the acoustic vocabulary;
      
      (c) mapping said concatenated baseforms into words;
      
      (d) computing language model (LM) scores associated with said words using a language model,andperforming further decoding of said utterance based thereupon.
  - 11. The method of claim 10, further comprising splitting said words via linguistic splitting based on any one of spelling, phones and morphemes.

12. A system for analyzing a language for providing speech recognition, the system comprising:
- at least one processor programmed to;
  
  determine a threshold frequency of occurrence, within a corpus, of word forms in a language vocabulary V;
  
  in response to determining that a subset of the word forms has a frequency of occurrence less than the threshold frequency, split at least some of the word forms in the subset to generate word form components, at least some of the word form components not being full words;
  
  generate a language component vocabulary VC comprising the word forms in the language vocabulary V and the word form components; and
  
  generate information indicating a correspondence between the word forms in the language vocabulary V and corresponding word form components.
- View Dependent Claims (13, 14, 15)
- - 13. The system of claim 12, wherein the at least one processor is programmed to split word forms subject to a constraint in which a word form that contains a given string of letters is prevented from being split within the string if the string of letters corresponds to one phoneme.
  - 14. The system of claim 12, wherein the at least one processor is programmed to split word forms using a fixed vocabulary and a fixed list of allowable endings, with each word from the fixed vocabulary being split into at least a stem and an ending that is an element of the fixed set of endings, so as to substantially minimize the total number of all stems that are required to split every word in the fixed vocabulary, wherein the fixed set of allowable endings includes an empty ending.
  - 15. The system of claim 12, wherein said information indicating a correspondence between the word forms and corresponding word form components provides a map of the word forms to the word form components, the map further including a plurality of non-split words each being associated with itself, and wherein the at least one processor is further programmed to:
    - filter a textual corpus using the map to generate a textual component corpus containing the non-split word forms and the word form components of the map;
      
      accumulate the word form components and the non-split word forms generated by said filtering step in an n-gram language model; and
      
      determine counts of n-tuple sets of word form components and word forms to estimate n-gram probabilities for the n-gram language model.

16. A system for providing speech recognition, comprising:
- at least one processor programmed to;
  
  determining a threshold frequency of occurrence, within a corpus, of word forms in a language vocabulary V;
  
  in response to determining that a subset of the word forms has a frequency of occurrence less than the threshold frequency, split at least some of the word forms in the subset to generate word form components, at least some of the word form components not being full words;
  
  generate a language component vocabulary VC comprising the word forms in the language vocabulary V and the word form components;
  
  map the language vocabulary V into an acoustic vocabulary comprising baseforms;
  
  split the acoustic vocabulary into baseform components and store said baseform components; and
  
  perform sound to spelling mapping on said baseform components so as to generate information indicating a correspondence between baseform components and word parts for use in subsequent decoding of speech.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The system of claim 16, wherein the at least one processor is programmed to perform spelling to sound mapping including applying a predetermined set of rules to each word in a word string of a textual corpus, with pronunciations of words being obtained from the information indicating a correspondence between baseform components and word parts, wherein baseforms reflected in said information indicating a correspondence between baseform components and word parts are collected in said acoustic vocabulary.
  - 18. The system of claim 17, wherein the information indicating a correspondence between baseform components and word parts applies spelling to sound mapping to strings of components, said strings of components being obtained by filtering words of said textual corpus.
  - 19. The system of claim 16, wherein the at least one processor is programmed to decode a speech utterance using the language model components and acoustic components by:
    - (a) generating from said utterance a stack of baseform component paths;
      
      (b) concatenating baseform components in a path to generate concatenated baseforms, when the concatenated baseform components correspond to a baseform found in the acoustic vocabulary;
      
      (c) mapping said concatenated baseforms into words;
      
      (d) computing language model (LM) scores associated with said words using a language model;
      
      and(e) performing further decoding of said utterance based thereupon.
  - 20. The system of claim 19, wherein the at least one processor is programmed to split said words via linguistic splitting based on any of spelling, phones and morphemes.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Gopalakrishnan, Ponani, Sedivy, Jan, Kanevsky, Dimitri, Monkowski, Michael Daniel
Primary Examiner(s)
Hudspeth; David R
Assistant Examiner(s)
Spooner; Lamont M

Application Number

US11/064,643
Publication Number

US 20050143972A1
Time in Patent Office

2,035 Days
Field of Search

704/231, 704/251, 704/254, 704/257, 704/221, 704/1, 704/9, 704/10
US Class Current

704/251
CPC Class Codes

G06F 40/237   Lexical tools

G10L 15/183   using context dependencies,...

G10L 15/197   Probabilistic grammars, e.g...

Y10S 707/99942   Manipulating data structure...

System and methods for acoustic and language modeling for automatic speech recognition with large vocabularies

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

17 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

System and methods for acoustic and language modeling for automatic speech recognition with large vocabularies

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

17 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links