Small footprint language and vocabulary independent word recognizer using registration by word spelling

US 6,684,185 B1
Filed: 09/04/1998
Issued: 01/27/2004
Est. Priority Date: 09/04/1998
Status: Expired due to Term

First Claim

Patent Images

1. A speech recognizer having a lexicon updatable by spelled work input, comprising:

a phoneticizer for generating a first phonetic transcription of said spelled word input using probabilistic rules and a second phonetic transcription of said spelled word input using probabilistic rules;

a hybrid unit generator receptive of said first phonetic transcription and said second phonetic transcription for generating a first hybrid unit representation of said spelled word input and a second hybrid unit representation of said spelled word input;

a transcription selector that selects one of said first hybrid unit representation and said second hybrid unit representation based on rules regarding phonetic transcription; and

a word template constructor that generates for said word a sequence of symbols indicative of said selected hybrid unit representation for storing in said lexicon, wherein said hybrid unit generator has a dictionary of hybrid units selected to ensure that a class of larger sound units represents sounds in the lexicon that are more frequently used, and to ensure that a class of smaller sound units represent sounds in the lexicon that are less frequently used in comparison to the sounds that are more frequently used.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A phoneticizer converts spelled words or names into one or an n-best number of phonetic transcriptions. The n-best transcriptions may be generated from a single transcription using a confusion matrix. These n-best transcriptions are then transformed into hybrid units. Preferably only the most frequently encountered units are stored as syllables, with the remainder being stored as smaller units such as demi-syllables or phonemes. Voice input is then used to rescore the n-best transcriptions and these are stored preferably as speaker-independent, similarity-based hybrid units concatenated into a string representing the spelled word.

Citations

26 Claims

1. A speech recognizer having a lexicon updatable by spelled work input, comprising:
- a phoneticizer for generating a first phonetic transcription of said spelled word input using probabilistic rules and a second phonetic transcription of said spelled word input using probabilistic rules;
  
  a hybrid unit generator receptive of said first phonetic transcription and said second phonetic transcription for generating a first hybrid unit representation of said spelled word input and a second hybrid unit representation of said spelled word input;
  
  a transcription selector that selects one of said first hybrid unit representation and said second hybrid unit representation based on rules regarding phonetic transcription; and
  
  a word template constructor that generates for said word a sequence of symbols indicative of said selected hybrid unit representation for storing in said lexicon, wherein said hybrid unit generator has a dictionary of hybrid units selected to ensure that a class of larger sound units represents sounds in the lexicon that are more frequently used, and to ensure that a class of smaller sound units represent sounds in the lexicon that are less frequently used in comparison to the sounds that are more frequently used.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
- - 2. The speech recognizer of claim 1 wherein said phoneticizer includes a set of decision trees that identify different phoneme transcriptions corresponding to letters of an alphabet.
  - 3. The speech recognizer of claim 1 further comprising a multiple phonetic transcription generator that converts first phonetic transcription and said second phonetic transcription into an n-best plurality of phonetic transcriptions.
  - 4. The speech recognizer of claim 1 wherein said phoneticizer generates an n-best plurality of phonetic transcriptions.
  - 5. The speech recognizer of claim 1 wherein said hybrid unit generator generates a plurality of hybrid unit representations of said spelled word.
  - 6. The speech recognizer of claim 5 further comprising scoring processor for applying a score to each of said plurality of hybrid unit representations and for selecting at least one of said plurality of hybrid unit representations to be provided to said word template constructor based on said score.
  - 7. The speech recognizer of claim 6 wherein said scoring processor includes a set of decision trees that apply different scores to different phoneme transcriptions.
  - 8. The speech recognizer of claim 1 wherein said template constructor include a dictionary containing similarity-based representation of said hybrid units.
  - 9. The speech recognizer of claim 1 wherein said phoneticizer includes a memory for storing spelling-to-pronunciation data comprising:
10. The speech recognizer of claim 1 further wherein said hybrid units are represented as similarity parameters.
11. The speech recognizer of claim 1 wherein said hybrid units are represented as phone similarity parameters based on an average similarity derived from a plurality of training examples.
12. The speech recognizer of claim 1 further comprising hybrid unit duration modification rules for expanding or compressing duration of selected hybrid units based on length of said spelled word.
13. The speech recognizer of claim 1 further comprising pattern matching mechanism for comparing a voiced input to said lexicon, said pattern matching mechanism having weighted mechanism for increasing the importance of selected portions of said hybrid units during pattern matching.
14. The speech recognizer of claim 1, wherein said transcription selector selects one of said first hybrid unit representation and said second hybrid unit representation based on voiced pronunciation by said user of the word corresponding to the spelled word input such that the voiced pronunciation of the word is sufficient to identify said selected hybrid unit representation.
15. The speech recognizer of claim 1, wherein said transcription selector has a rescoring mechanism that assigns new probability scores to said first hybrid unit representation and said second hybrid unit representation based on rules regarding phonetic transcription.
16. The speech recognizer of claim 15, wherein said rescording selector employs mixed decision trees comprising questions based on letters and questions based on phonemes.

17. A speech recognizer having a lexicon user updateable by spelled word input, comprising:
- a phoneticizer for generating a first phonetic transcription of said spelled word input using a probabilistic and a second phonetic transcription of said spelled word input using probabilistic rules, and generating stress level indicators for different phonemes;
  
  a hybrid unit generator receptive of said first phonetic transcription and said second phonetic transcription for generating a first hybrid unit representation and a second hybrid unit representation of said spelled word input based on a syllabic transcription of said first phonetic transcription and second phonetic transcription using said stress level indicators;
  
  a transcription selector that selects one of said first hybrid unit representation and said second hybrid unit representation based on rules regarding phonetic transcription; and
  
  a word template constructor that generates for said spelled word a sequence of symbols indicative of said selected hybrid unit representation for storing in said lexicon, wherein said hybrid unit generator has a dictionary of hybrid units selected to ensure that a class of larger sound units represent sounds in the lexicon that are more frequently used, and to ensure that a class of smaller sound units represent sounds in the lexicon that are less frequently used in comparison to the sounds that are more frequently used.
- View Dependent Claims (18, 19, 20, 21, 22, 23, 24, 25)
- - 18. The speech recognizer of claim 17 wherein said phoneticizer includes a set of decision trees that identify different phoneme transcriptions corresponding to letters of an alphabet.
  - 19. The speech recognizer of claim 17 wherein hybrid unit generator generates a plurality of hybrid unit representations of said spelled word.
  - 20. The speech recognizer of claim 19 further comprising scoring processor for applying a score to each of said plurality of hybrid unit representations and for selecting at least one of said plurality of hybrid unit representations to be provided to said word template constructor based on said score.
  - 21. The speech recognizer of claim 20 wherein said scoring processor includes a set of decision trees that apply different scores to different phoneme transcriptions.
  - 22. The speech recognizer of claim 17 wherein said phoneticizer includes a memory for storing spelling-to-pronunciation data comprising:
23. The speech recognizer of claim 17, wherein said transcription selector selects one of said first hybrid unit representation and said second hybrid unit representation based on voiced pronunciation by said user of the word corresponding to the spelled word input such that the voiced pronunciation of the word is sufficient to identify said selected hybrid unit representation.
24. The speech recognizer of claim 23, wherein said transcription selector has a rescording mechanism that assigns new probability scores to said first hybrid unit representation and said second hybrid unit representation based on rules regarding phonetic transcription.
25. The speech recognizer of claim 24, wherein said rescoring selector employs mixed decision trees comprising questions based on letters and questions based on phonemes.

26. A speech recognizer having a lexicon user updateable by spelled word input, comprising:
- a phoneticizer for generating a first phonetic transcription of said spelled word input;
  
  a hybrid unit generator receptive of said phonetic transcription for generating at least one hybrid unit representation of said spelled word input based on said phonetic transcription; and
  
  a word template constructor that generates for said spelled word a sequence of symbols indicative of said hybrid unit representation for storing in said lexicon, wherein said hybrid unit generator has a dictionary of hybrid units selected to ensure that a class of larger sound units represent sounds in the lexicon that are more frequently used, and to ensure that a class of smaller sound units represent sounds in the lexicon that are less frequently used in comparison to the sounds that are more frequently used.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Panasonic Intellectual Property Corporation of America (Panasonic Holdings Corporation)
Original Assignee
Matsushita Electric Industrial Company Limited (Panasonic Holdings Corporation)
Inventors
Applebaum, Ted, Kuhn, Roland, Junqua, Jean-Claude
Primary Examiner(s)
Dorvil, Richemond
Assistant Examiner(s)
ARMSTRONG, ANGELA A

Application Number

US09/148,579
Time in Patent Office

1,971 Days
Field of Search

704/243, 704/244, 704/251, 704/254, 704/255, 704/241, 704/235
US Class Current

704/243
CPC Class Codes

G10L 15/063   Training

G10L 15/26   Speech to text systems G10L...

G10L 2015/088   Word spotting

Small footprint language and vocabulary independent word recognizer using registration by word spelling

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

26 Claims

Specification

Solutions

Use Cases

Quick Links

Small footprint language and vocabulary independent word recognizer using registration by word spelling

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

26 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links