METHOD FOR BUILDING LANGUAGE MODEL, SPEECH RECOGNITION METHOD AND ELECTRONIC APPARATUS

US 20150112679A1
Filed: 09/29/2014
Published: 04/23/2015
Est. Priority Date: 10/18/2013
Status: Active Grant

First Claim

Patent Images

1. A method for building a language model, adapted to an electronic apparatus, the method comprising:

receiving a plurality of candidate sentences; and

obtaining a plurality of phonetic spellings matching each of words in each of the candidate sentences and a plurality of word probabilities according to a text corpus, so as to obtain a candidate sentence table corresponding to the candidate sentences.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for building a language model, a speech recognition method and an electronic apparatus are provided. The speech recognition method includes the following steps. Phonetic transcriptions of a speech signal are obtained from an acoustic model. Phonetic spellings matching the phonetic transcriptions are obtained according to the phonetic transcriptions and a syllable acoustic lexicon. According to the phonetic spellings, a plurality of text sequences and a plurality of text sequence probabilities are obtained from a language model. Each phonetic spelling is matched to a candidate sentence table; a word probability of each phonetic spelling matching a word in a sentence of the sentence table are obtained; and the word probabilities of the phonetic spellings are calculated so as to obtain the text sequence probabilities. The text sequence corresponding to a largest one of the sequence probabilities is selected as a recognition result of the speech signal.

37 Citations

View as Search Results

28 Claims

1. A method for building a language model, adapted to an electronic apparatus, the method comprising:
- receiving a plurality of candidate sentences; and
  
  obtaining a plurality of phonetic spellings matching each of words in each of the candidate sentences and a plurality of word probabilities according to a text corpus, so as to obtain a candidate sentence table corresponding to the candidate sentences.
- View Dependent Claims (2, 3)
- - 2. The method for building the language model of claim 1, further comprising:
    - obtaining the text corpus through training with a plurality of speech signals based on different languages, dialects or different pronunciation habits.
  - 3. The method for building the language model of claim 2, wherein the step of obtaining the text corpus through training with the speech signals based on different languages, dialects or different pronunciation habits comprises:
    - receiving the phonetic spellings matching pronunciations of each of the words according to the corresponding words in the speech signals; and
      
      obtaining the word probabilities of each of the words corresponding to each of the phonetic spellings in the text corpus by training according to each of the words and the phonetic spellings

4. A speech recognition method, adapted to an electronic apparatus, comprising:
- obtaining a phonetic transcription sequence of a speech signal according to an acoustic model, and the phonetic transcription sequence including a plurality of phones;
  
  obtaining a plurality of phonetic spellings matching the phonetic transcription sequence according to the phonetic transcription sequence and a syllable acoustic lexicon;
  
  obtaining a plurality of text sequences and a plurality of text sequence probabilities from a language model according to the phonetic spellings, andmatching each of the phonetic spellings with a candidate sentence table, so as to obtain a word probability of each of the phonetic spellings corresponding to each of the words in the candidate sentences; and
  
  calculating the word probabilities of the phonetic spellings, so as to obtain the text sequence probabilities, wherein the candidate sentences corresponding to the text sequence probabilities are the text sequences; and
  
  selecting the text sequence corresponding to a largest one among the text sequence probabilities as a recognition result of the speech signal.
- View Dependent Claims (5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 5. The speech recognition method of claim 4, further comprising:
    - obtaining the acoustic model through training with the speech signals based on different languages, dialects or different pronunciation habits.
  - 6. The speech recognition method of claim 5, wherein the step of obtaining the acoustic model through training with the speech signals based on different languages, dialects or different pronunciation habits comprises:
    - receiving the phonetic transcription sequences matching pronunciations in the speech signals; and
      
      obtaining data of the phones corresponding to the phonetic transcription sequences in the acoustic model by training according to the speech signals and the phonetic transcription sequences.
  - 7. The speech recognition method of claim 4, wherein the step of obtaining the phonetic transcription sequence of the speech signal according to the acoustic model comprises:
    - selecting a training data from the acoustic model according to a predetermined setting, wherein the training data is one of training results of different languages, dialects or different pronunciation habits;
      
      calculating a phonetic transcription matching probability of the phonetic transcription sequences matching the phones according to the selected training data and each of the phones of the speech signal; and
      
      selecting the phonetic transcription sequence corresponding to a largest one among the phonetic transcription matching probabilities to be used as the phonetic transcription sequence of the speech signal.
  - 8. The speech recognition method of claim 4, wherein the step of obtaining the phonetic spellings matching the phonetic transcription sequence according to the phonetic transcription sequence and the syllable acoustic lexicon comprises:
    - obtaining an intonation information corresponding to each of the phonetic spellings according to a tone of the phonetic transcription sequence.
  - 9. The speech recognition method of claim 4, wherein the step of obtaining the phonetic spellings matching the phonetic transcription sequence according to the phonetic transcription sequence and the syllable acoustic lexicon further comprises:
    - obtaining the phonetic spellings matching the phonetic transcription sequence and obtaining a phonetic spelling matching probability of the phonetic transcription sequence matching each of the phonetic spellings according to the phonetic transcription sequence and the syllable acoustic lexicon; and
      
      selecting the phonetic spelling corresponding to a largest one among the phonetic spelling matching probabilities to be used as the phonetic spelling matching each of the phonetic transcription sequences.
  - 10. The speech recognition method of claim 9, further comprising:
    - selecting the text sequence corresponding to the largest one among associated probabilities including the phonetic spelling matching probabilities and the text sequence probabilities, to be used as the recognition result of the speech signal.
  - 11. The speech recognition method of claim 4, further comprising:
    - receiving a plurality of candidate sentences; and
      
      obtaining a plurality of phonetic spellings matching each of words in each of candidate sentences and a plurality of word probabilities according to a text corpus, so as to obtain the candidate sentence table corresponding to the candidate sentences.
  - 12. The speech recognition method of claim 11, further comprising:
    - obtaining the text corpus through training with a plurality of speech signals based on the speech signals of different languages, dialects or different pronunciation habits.
  - 13. The speech recognition method of claim 12, wherein the step of obtaining the text corpus through training with the speech signals based on different languages, dialects or different pronunciation habits comprises:
    - receiving the phonetic spellings matching pronunciations of each of the words according to the corresponding words in the speech signals; and
      
      obtaining the word probabilities of each of the words corresponding to each of the phonetic spellings in the text corpus by training according to each of the words and the phonetic spellings
  - 14. The speech recognition method of claim 12, wherein the step of obtaining the text sequences and the text sequence probabilities from the language model according to the phonetic spellings comprises:
    - selecting the candidate sentence table according to a predetermined setting, wherein the candidate sentence table is corresponding to the text corpus obtained by training with one of the speech signals based on different languages, dialects or different pronunciation habits.

15. An electronic apparatus, comprising:
- a storage unit, storing a plurality of program code segments; and
  
  a processing unit, coupled to the storage unit, the processing unit executing a plurality of commands through the program code segments, and the commands comprising;
  
  receiving a plurality of candidate sentences; and
  
  obtaining a plurality of phonetic spellings matching each of words in each of the candidate sentences and a plurality of word probabilities according to a text corpus, so as to obtain a candidate sentence table corresponding to the candidate sentences.
- View Dependent Claims (16, 17)
- - 16. The electronic apparatus of claim 15, further comprising:
    - an input unit, receiving a plurality of speech signals, and the commands further comprising;
      
      obtaining the text corpus through training with a plurality of speech signals based on the speech signals of different languages, dialects or different pronunciation habits.
  - 17. The electronic apparatus of claim 16, wherein the command of obtaining the text corpus through training with the speech signals based on different languages, dialects or different pronunciation habits comprises:
    - receiving the phonetic spellings matching pronunciations of each of the words according to the corresponding words in the speech signals; and
      
      obtaining the word probabilities of each of the words corresponding to each of the phonetic spellings in the text corpus by training according to each of the words and the phonetic spellings

18. An electronic apparatus, comprising:
- an input unit, receiving a speech signal;
  
  a storage unit, storing a plurality of program code segments; and
  
  a processing unit, coupled to the input unit and the storage unit, the processing unit executing a plurality of commands through the program code segments, and the commands comprising;
  
  obtaining a phonetic transcription sequence of the speech signal according to an acoustic model, and the phonetic transcription sequence including a plurality of phones;
  
  obtaining a plurality of phonetic spellings matching the phonetic transcription sequence according to the phonetic transcription sequence and a syllable acoustic lexicon;
  
  obtaining a plurality of text sequences and a plurality of text sequence probabilities from a language model according to the phonetic spellings, andmatching each of the phonetic spellings with a candidate sentence table, so as to obtain a word probability of each of the phonetic spellings corresponding to each of the words in the candidate sentences; and
  
  calculating the word probabilities of the phonetic spellings, so as to obtain the text sequence probabilities, wherein the candidate sentences corresponding to the text sequence probabilities are the text sequences; and
  
  selecting the text sequence corresponding to a largest one among the text sequence probabilities as a recognition result of the speech signal.
- View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26, 27, 28)
- - 19. The electronic apparatus of claim 18, wherein the commands further comprise:
    - obtaining the acoustic model through training with the speech signals based on different languages, dialects or different pronunciation habits.
  - 20. The electronic apparatus of claim 19, wherein the command of obtaining the acoustic model through training with the speech signals based on different languages, dialects or different pronunciation habits comprises:
    - receiving the phonetic transcription sequences matching pronunciations in the speech signals; and
      
      obtaining data of the phones corresponding to the phonetic transcription sequences in the acoustic model by training according to the speech signals and the phonetic transcription sequences.
  - 21. The electronic apparatus of claim 18, wherein the command of obtaining the phonetic transcription sequences of the speech signal according to the acoustic model comprises:
    - selecting a training data from the acoustic model according to a predetermined setting, wherein the training data is one of training results of different languages, dialects or different pronunciation habits;
      
      calculating a phonetic transcription matching probability of the phonetic transcription sequences matching the phones according to the selected training data and each of the phones of the speech signal; and
      
      selecting the phonetic transcription sequence corresponding to a largest one among the phonetic transcription matching probabilities to be used as the phonetic transcription sequence of the speech signal.
  - 22. The electronic apparatus of claim 18, wherein the command of obtaining the phonetic spellings matching the phonetic transcription sequence according to the phonetic transcription sequence and the syllable acoustic lexicon comprises:
    - obtaining an intonation information corresponding to each of the phonetic spellings according to a tone of the phonetic transcription sequence.
  - 23. The electronic apparatus of claim 18, wherein the command of obtaining the phonetic spellings matching the phonetic transcription sequence according to the phonetic transcription sequence and the syllable acoustic lexicon further comprises:
    - obtaining the phonetic spellings matching the phonetic transcription sequence and obtaining a phonetic spelling matching probability of the phonetic transcription sequence matching each of the phonetic spellings according to the phonetic transcription sequence and the syllable acoustic lexicon; and
      
      selecting the phonetic spelling corresponding to a largest one among the phonetic spelling matching probabilities to be used as the phonetic spelling matching each of the phonetic transcription sequences.
  - 24. The electronic apparatus of claim 23, wherein the commands further comprise:
    - selecting the text sequence corresponding to the largest one among associated probabilities including the phonetic spelling matching probabilities and the text sequence probabilities, to be used as the recognition result of the speech signal.
  - 25. The electronic apparatus of claim 18, wherein the commands further comprise:
    - receiving a plurality of candidate sentences; and
      
      obtaining a plurality of phonetic spellings matching each of words in each of candidate sentences and a plurality of word probabilities according to a text corpus, so as to obtain the candidate sentence table corresponding to the candidate sentences.
  - 26. The electronic apparatus of claim 25, wherein the commands further comprise:
    - obtaining the text corpus through training with a plurality of speech signals based on the speech signals of different languages, dialects or different pronunciation habits.
  - 27. The electronic apparatus of claim 26, wherein the command of obtaining the text corpus through training with the speech signals based on different languages, dialects or different pronunciation habits comprises:
    - receiving the phonetic spellings matching pronunciations of each of the words according to the corresponding words in the speech signals; and
      
      obtaining the word probabilities of each of the words corresponding to each of the phonetic spellings in the text corpus by training according to each of the words and the phonetic spellings.
  - 28. The electronic apparatus of claim 26, wherein the command of obtaining the text sequences and the text sequence probabilities from the language model according to the phonetic spellings comprises:
    - selecting the candidate sentence table according to a predetermined setting, wherein the candidate sentence table is corresponding to the text corpus obtained by training with one of the speech signals based on different languages, dialects or different pronunciation habits.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
VIA Technologies Incorporated (VIA Technologies)
Original Assignee
VIA Technologies Incorporated (VIA Technologies)
Inventors
Zhang, Guo-Feng

Granted Patent

US 9,711,138 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/243
CPC Class Codes

G10L 15/063   Training

G10L 15/14   using statistical models, e...

G10L 15/187   Phonemic context, e.g. pron...

G10L 15/26   Speech to text systems G10L...

G10L 2015/0633   using lexical or orthograph...

METHOD FOR BUILDING LANGUAGE MODEL, SPEECH RECOGNITION METHOD AND ELECTRONIC APPARATUS

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

37 Citations

28 Claims

Specification

Solutions

Use Cases

Quick Links

METHOD FOR BUILDING LANGUAGE MODEL, SPEECH RECOGNITION METHOD AND ELECTRONIC APPARATUS

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

37 Citations

28 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links