SYSTEM AND METHOD FOR DECODING SPEECH

US 20140067394A1
Filed: 08/28/2012
Published: 03/06/2014
Est. Priority Date: 08/28/2012
Status: Abandoned Application

First Claim

Patent Images

1. A computer software product that includes a computer readable media readable by a processor, the computer readable media having stored thereon a set of instructions for performing decoding of speech, the instructions comprising:

(a) a first set of instructions which, when loaded into main memory and executed by the processor, causes the processor to establish a pronunciation dictionary for a particular language and store the pronunciation dictionary in computer readable memory, the pronunciation dictionary including a plurality of words, each of the words being divided into phonemes of the language, each of the phonemes being represented by a single character;

(b) a second set of instructions which, when loaded into main memory and executed by the processor, causes the processor to train an acoustic model for the language, the acoustic model including hidden Markov models corresponding to the phonemes of the language;

(c) a third set of instructions which, when loaded into main memory and executed by the processor, causes the processor to store the trained acoustic model in the computer readable memory;

(d) a fourth set of instructions which, when loaded into main memory and executed by the processor, causes the processor to train a language model for the language, the language model being an N-gram language model containing probabilities of particular word sequences from a transcription corpus;

(e) a fifth set of instructions which, when loaded into main memory and executed by the processor, causes the processor to store the trained language model in the computer readable memory;

(f) a sixth set of instructions which, when loaded into main memory and executed by the processor, causes the processor to receive at least one spoken word in the language and generate a digital speech signal corresponding the at least one spoken word;

(g) a seventh set of instructions which, when loaded into main memory and executed by the processor, causes the processor to perform phoneme recognition on the speech signal to generate a set of spoken phonemes of the at least one word, the set of spoken phonemes being recorded in the computer readable memory, wherein each of the spoken phonemes is represented by a single character;

(h) an eighth set of instructions which, when loaded into main memory and executed by the processor, causes the processor to perform sequence alignment between the spoken phonemes of the at least one word and a set of reference phonemes of the pronunciation dictionary corresponding to the at least one word;

(i) a ninth set of instructions which, when loaded into main memory and executed by the processor, causes the processor to compare the spoken phonemes of the at least one word and the set of reference phonemes of the pronunciation dictionary corresponding to the at least one word to identify a set of unique variants; and

(j) a tenth set of instructions which, when loaded into main memory and executed by the processor, causes the processor to update the pronunciation dictionary and the language model by adding the set of unique variants thereto and recording the updated pronunciation dictionary and the language model in the computer readable memory.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The system and method for speech decoding in speech recognition systems provides decoding for speech variants common to such languages. These variants include within-word and cross-word variants. For decoding of within-word variants, a data-driven approach is used, in which phonetic variants are identified, and a pronunciation dictionary and language model of a dynamic programming speech recognition system are updated based upon these identifications. Cross-word variants are handled with a knowledge-based approach, applying phonological rules, part-of-speech tagging or tagging of small words to a speech transcription corpus and updating the pronunciation dictionary and language model of the dynamic programming speech recognition system based upon identified cross-word variants.

Citations

7 Claims

1. A computer software product that includes a computer readable media readable by a processor, the computer readable media having stored thereon a set of instructions for performing decoding of speech, the instructions comprising:
- (a) a first set of instructions which, when loaded into main memory and executed by the processor, causes the processor to establish a pronunciation dictionary for a particular language and store the pronunciation dictionary in computer readable memory, the pronunciation dictionary including a plurality of words, each of the words being divided into phonemes of the language, each of the phonemes being represented by a single character;
  
  (b) a second set of instructions which, when loaded into main memory and executed by the processor, causes the processor to train an acoustic model for the language, the acoustic model including hidden Markov models corresponding to the phonemes of the language;
  
  (c) a third set of instructions which, when loaded into main memory and executed by the processor, causes the processor to store the trained acoustic model in the computer readable memory;
  
  (d) a fourth set of instructions which, when loaded into main memory and executed by the processor, causes the processor to train a language model for the language, the language model being an N-gram language model containing probabilities of particular word sequences from a transcription corpus;
  
  (e) a fifth set of instructions which, when loaded into main memory and executed by the processor, causes the processor to store the trained language model in the computer readable memory;
  
  (f) a sixth set of instructions which, when loaded into main memory and executed by the processor, causes the processor to receive at least one spoken word in the language and generate a digital speech signal corresponding the at least one spoken word;
  
  (g) a seventh set of instructions which, when loaded into main memory and executed by the processor, causes the processor to perform phoneme recognition on the speech signal to generate a set of spoken phonemes of the at least one word, the set of spoken phonemes being recorded in the computer readable memory, wherein each of the spoken phonemes is represented by a single character;
  
  (h) an eighth set of instructions which, when loaded into main memory and executed by the processor, causes the processor to perform sequence alignment between the spoken phonemes of the at least one word and a set of reference phonemes of the pronunciation dictionary corresponding to the at least one word;
  
  (i) a ninth set of instructions which, when loaded into main memory and executed by the processor, causes the processor to compare the spoken phonemes of the at least one word and the set of reference phonemes of the pronunciation dictionary corresponding to the at least one word to identify a set of unique variants; and
  
  (j) a tenth set of instructions which, when loaded into main memory and executed by the processor, causes the processor to update the pronunciation dictionary and the language model by adding the set of unique variants thereto and recording the updated pronunciation dictionary and the language model in the computer readable memory.
- View Dependent Claims (2, 3)
- - 2. The computer software product as recited in claim 1, further comprising an eleventh set of instructions which, when loaded into main memory and executed by the processor, causes the processor to remove duplicate unique variants from the set of unique variants prior to the tenth set of instructions.
  - 3. The computer software product as recited in claim 2, further comprising a twelfth set of instructions which, when loaded into main memory and executed by the processor, causes the processor to generate orthographic forms for each said unique variant in the set of unique variants.

4. A computer software product that includes a computer readable media readable by a processor, the computer readable media having stored thereon a set of instructions for performing decoding of speech, the instructions comprising:
- (a) a first set of instructions which, when loaded into main memory and executed by the processor, causes the processor to establish a pronunciation dictionary for a particular language and store the pronunciation dictionary in computer readable memory, said pronunciation dictionary including a plurality of words each divided into phonemes of the language;
  
  (b) a second set of instructions which, when loaded into main memory and executed by the processor, causes the processor to train an acoustic model for the language, the acoustic model including hidden Markov models corresponding to the phonemes of the language;
  
  (c) a third set of instructions which, when loaded into main memory and executed by the processor, causes the processor to store the trained acoustic model in the computer readable memory;
  
  (d) a fourth set of instructions which, when loaded into main memory and executed by the processor, causes the processor to train a language model for the language, the language model being an N-gram language model containing probabilities of particular word sequences from a transcription corpus;
  
  (e) a fifth set of instructions which, when loaded into main memory and executed by the processor, causes the processor to store the trained language model in the computer readable memory;
  
  (f) a sixth set of instructions which, when loaded into main memory and executed by the processor, causes the processor to receive at least one sentence including a plurality of words in the language and generate a digital speech signal corresponding the at least one sentence;
  
  (g) a seventh set of instructions which, when loaded into main memory and executed by the processor, causes the processor to perform phoneme recognition on the speech signal to generate a set of spoken phonemes of each of the words of the at least one sentence, said set of spoken phonemes being recorded in the computer readable memory;
  
  (h) an eighth set of instructions which, when loaded into main memory and executed by the processor, causes the processor to compare the spoken phonemes of the words of the at least one sentence and the set of reference phonemes of the pronunciation dictionary corresponding to the at least one word to form a transcription of the at least one sentence, the transcription being recorded in the computer readable memory;
  
  (i) a ninth set of instructions which, when loaded into main memory and executed by the processor, causes the processor to analyze each pair of adjacent words of the at least one sentence to identify a phonological rule selected from the group consisting of merging and changing;
  
  (j) a tenth set of instructions which, when loaded into main memory and executed by the processor, causes the processor to, upon identification of the phonological rule in at least one pair of the adjacent words, form at least one compound word to replace the at least one pair of words in the transcription; and
  
  (k) an eleventh set of instructions which, when loaded into main memory and executed by the processor, causes the processor to update the pronunciation dictionary and the language model by adding the at least one compound word thereto and recording the updated pronunciation dictionary and the language model in the computer readable memory.
- View Dependent Claims (5)
- - 5. The computer software product as recited in claim 4, further comprising a twelfth set of instructions which, when loaded into main memory and executed by the processor, causes the processor to replace the at least one compound word in the transcription with the original pair of adjacent words corresponding thereto, following the eleventh set of instructions.

6. A computer software product that includes a computer readable media readable by a processor, the computer readable media having stored thereon a set of instructions for performing decoding of speech, the instructions comprising:
- (a) a first set of instructions which, when loaded into main memory and executed by the processor, causes the processor to establish a pronunciation dictionary for a particular language and store the pronunciation dictionary in computer readable memory, said pronunciation dictionary including a plurality of words each divided into phonemes of the language;
  
  (b) a second set of instructions which, when loaded into main memory and executed by the processor, causes the processor to train an acoustic model for the language, the acoustic model including hidden Markov models corresponding to the phonemes of the language;
  
  (c) a third set of instructions which, when loaded into main memory and executed by the processor, causes the processor to store the trained acoustic model in the computer readable memory;
  
  (d) a fourth set of instructions which, when loaded into main memory and executed by the processor, causes the processor to train a language model for the language, the language model being an N-gram language model containing probabilities of particular word sequences from a transcription corpus;
  
  (e) a fifth set of instructions which, when loaded into main memory and executed by the processor, causes the processor to store the trained language model in the computer readable memory;
  
  (f) a sixth set of instructions which, when loaded into main memory and executed by the processor, causes the processor to receive at least one sentence including a plurality of words in the language and generate a digital speech signal corresponding the at least one sentence;
  
  (g) a seventh set of instructions which, when loaded into main memory and executed by the processor, causes the processor to perform phoneme recognition on the speech signal to generate a set of spoken phonemes of each of the words of the at least one sentence, said set of spoken phonemes being recorded in the computer readable memory;
  
  (h) an eighth set of instructions which, when loaded into main memory and executed by the processor, causes the processor to compare the spoken phonemes of the words of the at least one sentence and the set of reference phonemes of the pronunciation dictionary corresponding to the at least one word to form a transcription of the at least one sentence, the transcription being recorded in the computer readable memory;
  
  (i) a ninth set of instructions which, when loaded into main memory and executed by the processor, causes the processor to apply a part-of-speech tagger to the transcription and analyze each pair of adjacent tagged words of the at least one sentence to identify tagged words selected from the group consisting of adjective-noun words and word-preposition words;
  
  (j) a tenth set of instructions which, when loaded into main memory and executed by the processor, causes the processor to, upon identification of the tagged words in at least one pair of the adjacent tagged words, form at least one compound word to replace the at least one pair of tagged words in the transcription; and
  
  (k) an eleventh set of instructions which, when loaded into main memory and executed by the processor, causes the processor to update the pronunciation dictionary and the language model by adding the at least one compound word thereto and recording the updated pronunciation dictionary and the language model in the computer readable memory.
- View Dependent Claims (7)
- - 7. The computer software product as recited in claim 6, further comprising a twelfth set of instructions which, when loaded into main memory and executed by the processor, causes the processor to replace the at least one compound word in the transcription with the original pair of adjacent tagged words corresponding thereto, following the eleventh set of instructions.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
King Fahd University of Petroleum & Minerals (Government of Saudi Arabia), King Abdul AZIZ City For Science and Technology
Original Assignee
King Fahd University of Petroleum & Minerals (Government of Saudi Arabia), King Abdul AZIZ City For Science and Technology
Inventors
ELSHAFEI, MOUSTAFA, AL-MUHTASEB, HUSNI, ABUZEINA, DIA EDDIN M., AL-KHATIB, WASFI G.

Application Number

US13/597,162
Publication Number

US 20140067394A1
Time in Patent Office

Days
Field of Search
US Class Current

704/244
CPC Class Codes

G10L 15/144   Training of HMMs

G10L 15/187   Phonemic context, e.g. pron...

G10L 15/197   Probabilistic grammars, e.g...

SYSTEM AND METHOD FOR DECODING SPEECH

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

7 Claims

Specification

Solutions

Use Cases

Quick Links

SYSTEM AND METHOD FOR DECODING SPEECH

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

7 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links