Sub-lexical language models with word level pronunciation lexicons

US 9,292,489 B1
Filed: 04/03/2013
Issued: 03/22/2016
Est. Priority Date: 01/16/2013
Status: Active Grant

First Claim

Patent Images

1. A method performed by a data processing apparatus, the method comprising:

accessing a word level pronunciation lexicon and a word level training text corpus for a natural language;

segmenting, using a word decomposition system, the word level training text corpus into sub-lexical units;

training an n-gram language model over the sub-lexical units to produce a sub-lexical language model;

constructing, using the word decomposition system, a word to sub-lexical unit mapping transducer;

constructing a word level language model by;

obtaining a result of composing the mapping transducer with the sub-lexical language model, andperforming a projection on the result of the composition of the mapping transducer and the sub-lexical language model;

constructing a speech decoding network at least by composing a context dependency model with the word level pronunciation lexicon and with the word level language model;

receiving an audio stream from a user; and

recognizing the audio stream, using the speech decoding network.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An automatic speech recognition (ASR) system and method are provided for using sub-lexical language models together with word level pronunciation lexicons. These approaches operate by introducing a transduction between sequences of sub-lexical units and sequences of words.

190 Citations

23 Claims

1. A method performed by a data processing apparatus, the method comprising:
- accessing a word level pronunciation lexicon and a word level training text corpus for a natural language;
  
  segmenting, using a word decomposition system, the word level training text corpus into sub-lexical units;
  
  training an n-gram language model over the sub-lexical units to produce a sub-lexical language model;
  
  constructing, using the word decomposition system, a word to sub-lexical unit mapping transducer;
  
  constructing a word level language model by;
  
  obtaining a result of composing the mapping transducer with the sub-lexical language model, andperforming a projection on the result of the composition of the mapping transducer and the sub-lexical language model;
  
  constructing a speech decoding network at least by composing a context dependency model with the word level pronunciation lexicon and with the word level language model;
  
  receiving an audio stream from a user; and
  
  recognizing the audio stream, using the speech decoding network.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, wherein recognizing the audio stream comprises:
    - identifying a user command from at least a portion of the audio stream, using the speech decoding network; and
      
      performing the user command.
  - 3. The method of claim 1, wherein the natural language is an agglutinative or morphologically rich language.
  - 4. The method of claim 1, further comprising:
    - detecting ambiguous outputs from the word decomposition system; and
      
      obtaining a single segmentation, using a disambiguation mechanism, for each of the ambiguous outputs.
  - 5. The method of claim 1, wherein the n-gram language model is represented as a deterministic weighted finite-state automaton.
  - 6. The method of claim 1, wherein the mapping transducer maps each word to one segmentation.
  - 7. The method of claim 1, wherein the mapping transducer maps each word to one or more segmentations.
  - 8. The method of claim 1, wherein the speech decoding network is defined as:
    - C∘
      
      L_w∘
      
      Proj(T_w∘
      
      G_m)wherein C represents the context dependency model, L_wrepresents the world level pronunciation lexicon, T_wrepresents the mapping transducer, G_mrepresents the sub-lexical language model, and Proj represents performing the projection on the result of the composition of the mapping transducer with the sub-lexical language model.
  - 9. The method of claim 1, wherein segmenting the word level training text corpus into sub-lexical units using the word decomposition system comprises segmenting the word level training text corpus into sub-lexical units using a linguistic word decomposition system or a statistical word decomposition system.
  - 10. The method of claim 1, wherein the mapping transducer is associated with a finite-state machine, wherein an initial state of the finite-state machine represents a word and transitions between states of the finite-state machine represent sub-lexical units.

11. A system comprising:
- one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising;
  
  accessing a word level pronunciation lexicon and a word level training text corpus for a natural language;
  
  segmenting, using a word decomposition system, the word level training text corpus into sub-lexical units;
  
  training an n-gram language model over the sub-lexical units to produce a sub-lexical language model;
  
  constructing, using the word decomposition system, a word to sub-lexical unit mapping transducer;
  
  constructing a word level language model by;
  
  obtaining a result of composing the mapping transducer with the sub-lexical language model, andperforming a projection on the result of the composition of the mapping transducer and the sub-lexical language model;
  
  constructing a speech decoding network at least by composing a context dependency model with the word level pronunciation lexicon and with the word level language model;
  
  receiving an audio stream from a user; and
  
  recognizing the audio stream, using the speech decoding network.
- View Dependent Claims (12, 13, 14, 15)
- - 12. The system of claim 11, the operations further comprising:
    - detecting ambiguous outputs from the word decomposition system; and
      
      obtaining a single segmentation, using a disambiguation mechanism, for each of the ambiguous outputs.
  - 13. The system of claim 11, wherein the n-gram language model is represented as a deterministic weighted finite-state automaton.
  - 14. The system of claim 11, wherein the mapping transducer maps each word to one segmentation.
  - 15. The system of claim 11, wherein the mapping transducer maps each word to one or more segmentations.

16. A non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
- accessing a word level pronunciation lexicon and a word level training text corpus for a natural language;
  
  segmenting, using a word decomposition system, the word level training text corpus into sub-lexical units;
  
  training an n-gram language model over the sub-lexical units to produce a sub-lexical language model;
  
  constructing, using the word decomposition system, a word to sub-lexical unit mapping transducer;
  
  constructing a word level language model by;
  
  obtaining a result of composing the mapping transducer with the sub-lexical language model, andperforming a projection on the result of the composition of the mapping transducer and the sub-lexical language model;
  
  constructing a speech decoding network at least by composing a context dependency model with the word level pronunciation lexicon and with the word level language model;
  
  receiving an audio stream from a user; and
  
  recognizing the audio stream, using the speech decoding network.
- View Dependent Claims (17, 18, 19, 20, 21, 22, 23)
- - 17. The medium of claim 16, wherein recognizing the audio stream comprises:
    - identifying a user command from at least a portion of the audio stream, using the speech decoding network; and
      
      performing the user command.
  - 18. The medium of claim 16, wherein the natural language is an agglutinative or morphologically rich language.
  - 19. The medium of claim 16, the operations further comprising:
    - detecting ambiguous outputs from the word decomposition system; and
      
      obtaining a single segmentation, using a disambiguation mechanism, for each of the ambiguous outputs.
  - 20. The medium of claim 16, wherein the n-gram language model is represented as a deterministic weighted finite-state automaton.
  - 21. The medium of claim 16, wherein the mapping transducer maps each word to one segmentation.
  - 22. The medium of claim 16, wherein the mapping transducer maps each word to one or more segmentations.
  - 23. The medium of claim 16, wherein constructing the speech decoding network further comprises composing a context dependency model with the word level pronunciation lexicon and the word level language model.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Sak, Hasim, Saraclar, Murat
Primary Examiner(s)
He, Jialong
Assistant Examiner(s)
Wang, Yi-Sheng

Application Number

US13/855,893
Time in Patent Office

1,084 Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 40/289   Phrasal analysis, e.g. fini...

G06F 40/30   Semantic analysis

G10L 13/08   Text analysis or generation...

G10L 13/10   Prosody rules derived from ...

G10L 15/00   Speech recognition G10L17/0...

G10L 15/183   using context dependencies,...

G10L 15/197   Probabilistic grammars, e.g...

G10L 15/26   Speech to text systems G10L...

Sub-lexical language models with word level pronunciation lexicons

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

190 Citations

23 Claims

Specification

Solutions

Use Cases

Quick Links

Sub-lexical language models with word level pronunciation lexicons

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

190 Citations

23 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links