SPEECH RECOGNITION METHOD AND ELECTRONIC APPARATUS

US 20150112675A1
Filed: 09/19/2014
Published: 04/23/2015
Est. Priority Date: 10/18/2013
Status: Active Grant

First Claim

Patent Images

1. A speech recognition method, adapted to an electronic apparatus, comprising:

obtaining a phonetic transcription sequence of a speech signal according to an acoustic model;

obtaining a plurality of possible syllable sequences and a plurality of corresponding phonetic spelling matching probabilities according to the phonetic transcription sequence and a syllable acoustic lexicon;

obtaining, from a language model, a probability of a plurality of text sequences appeared in the language model; and

selecting the text sequence corresponding to a largest one among a plurality of associated probabilities to be used as a recognition result of the speech signal.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech recognition method and an electronic apparatus are provided. The speech recognition method includes the following steps. A plurality of phonetic transcriptions of a speech signal is obtained according to an acoustic model. A phonetic spelling and intonation information matched to the phonetic transcriptions are obtained according to a phonetic transcription sequence and a syllable acoustic lexicon of the invention. According to the phonetic spellings and the intonation information, a plurality of phonetic spelling sequences and a plurality of phonetic spelling sequence probabilities are obtained from a language model. The phonetic spelling sequence corresponding to a largest one among the phonetic spelling sequence probabilities is selected as a recognition result of the speech signal.

20 Citations

View as Search Results

20 Claims

1. A speech recognition method, adapted to an electronic apparatus, comprising:
- obtaining a phonetic transcription sequence of a speech signal according to an acoustic model;
  
  obtaining a plurality of possible syllable sequences and a plurality of corresponding phonetic spelling matching probabilities according to the phonetic transcription sequence and a syllable acoustic lexicon;
  
  obtaining, from a language model, a probability of a plurality of text sequences appeared in the language model; and
  
  selecting the text sequence corresponding to a largest one among a plurality of associated probabilities to be used as a recognition result of the speech signal.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The speech recognition method of claim 1, further comprising:
    - obtaining the acoustic model through training with the speech signals based on different languages, dialects or different pronunciation habits.
  - 3. The speech recognition method of claim 2, wherein the step of obtaining the acoustic model through training with the speech signals based on different languages, dialects or different pronunciation habits comprises:
    - receiving the phonetic transcription sequences matching pronunciations in the speech signals; and
      
      obtaining data of a plurality of phones corresponding to the phonetic transcription sequences in the acoustic model by training according to the speech signals and the phonetic transcription sequences.
  - 4. The speech recognition method of claim 3, wherein the step of obtaining the phonetic transcription sequence of the speech signal according to the acoustic model comprises:
    - selecting a training data from the acoustic model according to a predetermined setting, wherein the training data is one of training results of different languages, dialects or different pronunciation habits;
      
      calculating a phonetic transcription matching probability of each of the phonetic transcription sequences matching the phones according to the selected training data and each of the phones of the speech signal; and
      
      selecting the phonetic transcription sequence corresponding to a largest one among the phonetic transcription matching probabilities to be used as the phonetic transcription sequence of the speech signal.
  - 5. The speech recognition method of claim 1, wherein the step of obtaining the possible syllable sequences and the corresponding phonetic spelling matching probabilities according to the phonetic transcription sequence and the syllable acoustic lexicon comprises:
    - obtaining an intonation information corresponding to each of the syllable sequences according to a tone of the phonetic transcription sequence.
  - 6. The speech recognition method of claim 5, wherein the step of obtaining the possible syllable sequences and the corresponding phonetic spelling matching probabilities according to the phonetic transcription sequence and the syllable acoustic lexicon further comprises:
    - obtaining the syllable sequences matching the phonetic transcription sequence and obtaining the phonetic spelling matching probabilities of the phonetic transcription sequence matching each of the syllable sequences according to the phonetic transcription sequence and the syllable acoustic lexicon; and
      
      selecting the syllable sequence corresponding to a largest one among the phonetic spelling matching probabilities and the intonation information to be used as the syllable sequence and the intonation information matching the phonetic transcription sequence.
  - 7. The speech recognition method of claim 1, wherein the step of selecting the text sequence corresponding to the largest one among the associated probabilities to be used as the recognition result of the speech signal comprises:
    - selecting the text sequence corresponding to the largest one among the associated probabilities including the phonetic spelling matching probabilities and the probability of the text sequences appeared in the language model, to be used as the recognition result of the speech signal.
  - 8. The speech recognition method of claim 1, further comprising:
    - obtaining the language model through training with a plurality of corpus data based on different languages, dialects or different pronunciation habits.
  - 9. The speech recognition method of claim 8, wherein the step of obtaining the language model through training with the corpus data based on different languages, dialects or different pronunciation habits comprises:
    - obtaining the text sequences from the corpus data; and
      
      training according to the syllable sequences of the text sequences.
  - 10. The speech recognition method of claim 1, wherein the step of obtaining, from the language model, the probability of the text sequences appeared in the language model comprises:
    - selecting a training data from the corpus data according to a predetermined setting, wherein the training data is one of training results of different languages, dialects or different pronunciation habits.

11. An electronic apparatus, comprising:
- an input unit, receiving a speech signal;
  
  a storage unit, storing a plurality of program code segments; and
  
  a processing unit, coupled to the input unit and the storage unit, the processing unit executing a plurality of commands through the program code segments, and the commands comprising;
  
  obtaining a phonetic transcription sequence of the speech signal according to an acoustic model;
  
  obtaining a plurality of syllable sequences and a plurality of corresponding phonetic spelling matching probabilities according to the phonetic transcription sequence and a syllable acoustic lexicon;
  
  obtaining, from a language model, a probability of a plurality of text sequences appeared in the language model; and
  
  selecting the text sequence corresponding to a largest one among a plurality of associated probabilities to be used as a recognition result of the speech signal.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 12. The electronic apparatus of claim 11, wherein the commands further comprise:
    - obtaining the acoustic model through training with the speech signals based on different languages, dialects or different pronunciation habits.
  - 13. The electronic apparatus of claim 11, wherein the commands comprise:
    - receiving the phonetic transcription sequences matching pronunciations in the speech signals; and
      
      obtaining data of a plurality of phones corresponding to the phonetic transcription sequences in the acoustic model by training according to the speech signals and the phonetic transcription sequences.
  - 14. The electronic apparatus of claim 13, wherein the commands comprise:
    - selecting a training data from the acoustic model according to a predetermined setting, wherein the training data is one of training results of different languages, dialects or different pronunciation habits;
      
      calculating a phonetic transcription matching probability of each of the phonetic transcription sequences matching the phones according to the selected training data and each of the phones of the speech signal; and
      
      selecting the phonetic transcription sequence corresponding to a largest one among the phonetic transcription matching probabilities to be used as the phonetic transcription sequence of the speech signal.
  - 15. The electronic apparatus of claim 11, wherein the commands comprise:
    - obtaining an intonation information corresponding to each of the syllable sequences according to a tone of the phonetic transcription sequence.
  - 16. The electronic apparatus of claim 15, wherein the commands further comprise:
    - obtaining the syllable sequences matching the phonetic transcription sequence and obtaining the phonetic spelling matching probabilities of the phonetic transcription sequence matching each of the syllable sequences according to the phonetic transcription sequence and the syllable acoustic lexicon; and
      
      selecting the syllable sequence corresponding to a largest one among the phonetic spelling matching probabilities and the intonation information to be used as the syllable sequence and the intonation information matching the phonetic transcription sequence.
  - 17. The electronic apparatus of claim 11, wherein the commands further comprise:
    - selecting the text sequence corresponding to the largest one among the associated probabilities including the phonetic spelling matching probabilities and the probability of the text sequences appeared in the language model, to be used as the recognition result of the speech signal.
  - 18. The electronic apparatus of claim 11, wherein the commands further comprise:
    - obtaining the language model through training with a plurality of corpus data based on different languages, dialects or different pronunciation habits.
  - 19. The electronic apparatus of claim 18, wherein the commands further comprise:
    - obtaining the text sequences from the corpus data; and
      
      training according to the syllable sequences of the text sequences.
  - 20. The electronic apparatus of claim 11, wherein the commands further comprise:
    - selecting a training data from the corpus data according to a predetermined setting, wherein the training data is one of training results of different languages, dialects or different pronunciation habits.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
VIA Technologies Incorporated (VIA Technologies)
Original Assignee
VIA Technologies Incorporated (VIA Technologies)
Inventors
Zhang, Guo-Feng, Zhu, Yi-Fei

Granted Patent

US 9,613,621 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/235
CPC Class Codes

G10L 13/08   Text analysis or generation...

G10L 15/063   Training

G10L 15/187   Phonemic context, e.g. pron...

G10L 25/33   using fuzzy logic

SPEECH RECOGNITION METHOD AND ELECTRONIC APPARATUS

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

20 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

SPEECH RECOGNITION METHOD AND ELECTRONIC APPARATUS

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

20 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links