METHOD FOR BUILDING ACOUSTIC MODEL, SPEECH RECOGNITION METHOD AND ELECTRONIC APPARATUS

US 20150112674A1
Filed: 09/19/2014
Published: 04/23/2015
Est. Priority Date: 10/18/2013
Status: Abandoned Application

First Claim

Patent Images

1. A method for building an acoustic model, adapted to an electronic apparatus, the method comprising:

receiving a plurality of speech signals;

receiving a plurality of phonetic transcriptions matching pronunciations in the speech signals; and

obtaining data of a plurality of phones corresponding to the phonetic transcriptions in the acoustic model by training according to the speech signals and the phonetic transcriptions.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for building acoustic model, a speech recognition method and an electronic apparatus are provided. The speech recognition method includes the following steps. A plurality of phonetic transcriptions of a speech signal is obtained from an acoustic model. A plurality of vocabularies matching the phonetic transcriptions are obtained according to each phonetic transcription and a syllable acoustic lexicon, wherein the syllable acoustic lexicon includes the vocabularies corresponding to the phonetic transcription, and the vocabulary having at least one phonetic transcription includes a code corresponding to the phonetic transcription. A plurality of strings and a plurality of string probabilities are obtained from a language model according to the code of each of the vocabularies.

Citations

32 Claims

1. A method for building an acoustic model, adapted to an electronic apparatus, the method comprising:
- receiving a plurality of speech signals;
  
  receiving a plurality of phonetic transcriptions matching pronunciations in the speech signals; and
  
  obtaining data of a plurality of phones corresponding to the phonetic transcriptions in the acoustic model by training according to the speech signals and the phonetic transcriptions.
- View Dependent Claims (2)
- - 2. The method for building the acoustic model of claim 1, wherein the speech signals are speech inputs of a plurality of dialects or a plurality of pronunciation habits.

3. A speech recognition method, adapted to an electronic apparatus, comprising:
- obtaining a plurality of phonetic transcriptions of a speech signal according to an acoustic model, and the phonetic transcriptions including a plurality of phones;
  
  obtaining a plurality of vocabularies matching the phonetic transcriptions and obtaining a fuzzy sound probability of the phonetic transcription matching each of the vocabularies according to each of the phonetic transcriptions and a syllable acoustic lexicon; and
  
  selecting the vocabulary corresponding to a largest one among the fuzzy sound probabilities to be used as the vocabularies matching the speech signal.
- View Dependent Claims (4, 5, 6, 7)
- - 4. The speech recognition method of claim 3, further comprising:
    - obtaining the acoustic model through training with the speech signals based on different languages, dialects or different pronunciation habits.
  - 5. The speech recognition method of claim 4, wherein the step of obtaining the acoustic model through training with the speech signals based on different languages, dialects or different pronunciation habits comprises:
    - receiving the phonetic transcriptions matching pronunciations in the speech signals; and
      
      obtaining data of the phones corresponding to the phonetic transcriptions in the acoustic model by training according to the speech signals and the phonetic transcriptions.
  - 6. The speech recognition method of claim 3, wherein the step of obtaining the phonetic transcriptions of the speech signal according to the acoustic model comprises:
    - selecting a training data from the acoustic model according to a predetermined setting, wherein the training data is one of training results of different languages, dialects or different pronunciation habits;
      
      calculating a phonetic transcription matching probability of each of the phonetic transcriptions matching the phones according to the selected training data and each of the phones of the speech signal; and
      
      selecting each of the phonetic transcriptions corresponding to a largest one among the phonetic transcription matching probabilities to be used as the phonetic transcriptions of the speech signal.
  - 7. The speech recognition method of claim 3, wherein the step of obtaining the fuzzy sound probabilities of the phonetic transcription matching each of the vocabularies according to each of the phonetic transcriptions and the syllable acoustic lexicon comprises:
    - selecting a pronunciation statistical data from the syllable acoustic lexicon according to a predetermined setting, wherein the pronunciation statistical data is one of different languages, dialects or different pronunciation habits; and
      
      obtaining the phonetic transcriptions from the speech signals, and matching the phonetic transcriptions with the pronunciation statistical data, so as to obtain the fuzzy sound probabilities of each of the phonetic transcriptions matching each of the vocabularies.

8. A speech recognition method, adapted to an electronic apparatus, comprising:
- obtaining a plurality of phonetic transcriptions of the speech signal according to an acoustic model, and the phonetic transcriptions including a plurality of phones;
  
  obtaining a plurality of vocabularies matching the phonetic transcriptions according to each of the phonetic transcriptions and a syllable acoustic lexicon, wherein the syllable acoustic lexicon comprises the vocabularies corresponding to the phonetic transcriptions, and the vocabulary having at least one phonetic transcription comprises each of codes corresponding to each of the phonetic transcriptions;
  
  obtaining a plurality of strings and a plurality of string probabilities from a language model according to the code of each of the vocabularies; and
  
  selecting the string corresponding to a largest one among the string probabilities as a recognition result of the speech signal.
- View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16)
- - 9. The speech recognition method of claim 8, further comprising:
    - obtaining the acoustic model through training with the speech signals based on different languages, dialects or different pronunciation habits.
  - 10. The speech recognition method of claim 9, wherein the step of obtaining the acoustic model through training with the speech signals based on different languages, dialects or different pronunciation habits comprises:
    - receiving the phonetic transcriptions matching pronunciations in the speech signals; and
      
      obtaining data of the phones corresponding to the phonetic transcriptions in the acoustic model by training according to the speech signals and the phonetic transcriptions.
  - 11. The speech recognition method of claim 8, wherein the step of obtaining the phonetic transcriptions of the speech signal according to the acoustic model comprises:
    - selecting a training data from the acoustic model according to a predetermined setting, wherein the training data is one of training results of different languages, dialects or different pronunciation habits;
      
      calculating a phonetic transcription matching probability of each of the phonetic transcriptions matching the phones according to the selected training data and each of the phones of the speech signal; and
      
      selecting each of the phonetic transcriptions corresponding to a largest one among the phonetic transcription matching probabilities to be used as the phonetic transcriptions of the speech signal.
  - 12. The speech recognition method of claim 8, wherein the step of obtaining the vocabularies matching the phonetic transcription according to each of the phonetic transcriptions and the syllable acoustic lexicon comprises:
    - selecting a pronunciation statistical data from the syllable acoustic lexicon according to a predetermined setting, wherein the pronunciation statistical data is one of different languages, dialects or different pronunciation habits; and
      
      obtaining the phonetic transcriptions from the speech signals, and matching the phonetic transcriptions with the pronunciation statistical data, so as to obtain a fuzzy sound probability of each of the phonetic transcriptions matching each of the vocabularies.
  - 13. The speech recognition method of claim 12, further comprising:
    - selecting the string corresponding to a largest one among associated probabilities including the fuzzy sound probabilities and the string probabilities as a recognition result of the speech signal.
  - 14. The speech recognition method of claim 8, further comprising:
    - obtaining the language model through training with a plurality of corpus data based on different languages, dialects or different pronunciation habits.
  - 15. The speech recognition method of claim 14, wherein the step of obtaining the language model through training with the corpus data based on different languages, dialects or different pronunciation habits comprises:
    - obtaining the strings from the corpus data; and
      
      training the corresponding codes respectively according to the strings and the vocabularies of the strings, so as to obtain the string probabilities of the codes matching each of the strings.
  - 16. The speech recognition method of claim 14, wherein the step of obtaining the strings and the string probabilities from the language model according to the code of each of the vocabularies comprises:
    - selecting a training data from the corpus data according to a predetermined setting, wherein the training data is one of training results of different languages, dialects or different pronunciation habits.

17. An electronic apparatus, comprising:
- an input unit, receiving a plurality of speech signals;
  
  a storage unit, storing a plurality of program code segments; and
  
  a processing unit, coupled to the input unit and the storage unit, the processing unit executing a plurality of commands through the program code segments, and the commands comprising;
  
  receiving a plurality of phonetic transcriptions matching pronunciations in the speech signals; and
  
  obtaining data of a plurality of phones corresponding to the phonetic transcriptions in the acoustic model by training according to the speech signals and the phonetic transcriptions.
- View Dependent Claims (18)
- - 18. The electronic apparatus of claim 17, wherein the speech signals are speech inputs of a plurality of dialects or a plurality of pronunciation habits.

19. An electronic apparatus, comprising:
- an input unit, receiving a speech signal;
  
  a storage unit, storing a plurality of program code segments; and
  
  a processing unit, coupled to the input unit and the storage unit, the processing unit executing a plurality of commands through the program code segments, and the commands comprising;
  
  obtaining a plurality of phonetic transcriptions of the speech signal according to an acoustic model, and the phonetic transcriptions including a plurality of phones;
  
  obtaining a plurality of vocabularies matching the phonetic transcriptions and obtaining a fuzzy sound probability of the phonetic transcription matching each of the vocabularies according to each of the phonetic transcriptions and a syllable acoustic lexicon; and
  
  selecting the vocabulary corresponding to a largest one among the fuzzy sound probabilities to be used as the vocabularies matching the speech signal.
- View Dependent Claims (20, 21, 22, 23)
- - 20. The electronic apparatus of claim 19, wherein the commands further comprise:
    - obtaining the acoustic model through training with the speech signals based on different languages, dialects or different pronunciation habits.
  - 21. The electronic apparatus of claim 20, wherein the command of obtaining the acoustic model through training with the speech signals based on different languages, dialects or different pronunciation habits comprises:
    - receiving the phonetic transcriptions matching pronunciations in the speech signals; and
      
      obtaining data of the phones corresponding to the phonetic transcriptions in the acoustic model by training according to the speech signals and the phonetic transcriptions.
  - 22. The electronic apparatus of claim 19, wherein the command of obtaining the phonetic transcriptions of the speech signal according to the acoustic model comprises:
    - selecting a training data from the acoustic model according to a predetermined setting, wherein the training data is one of training results of different languages, dialects or different pronunciation habits;
      
      calculating a phonetic transcription matching probability of each of the phonetic transcriptions matching the phones according to the selected training data and each of the phones of the speech signal; and
      
      selecting each of the phonetic transcriptions corresponding to a largest one among the phonetic transcription matching probabilities to be used as the phonetic transcriptions of the speech signal.
  - 23. The electronic apparatus of claim 19, wherein the command of obtaining the fuzzy sound probabilities of the phonetic transcription matching each of the vocabularies according to each of the phonetic transcriptions and the syllable acoustic lexicon comprises:
    - selecting a pronunciation statistical data from the syllable acoustic lexicon according to a predetermined setting, wherein the pronunciation statistical data is one of different languages, dialects or different pronunciation habits; and
      
      obtaining the phonetic transcriptions from the speech signals, and matching the phonetic transcriptions with the pronunciation statistical data, so as to obtain the fuzzy sound probabilities of each of the phonetic transcriptions matching each of the vocabularies.

24. An electronic apparatus, comprising:
- an input unit, receiving a speech signal;
  
  a storage unit, storing a plurality of program code segments; and
  
  a processing unit, coupled to the input unit and the storage unit, the processing unit executing a plurality of commands through the program code segments, and the commands comprising;
  
  obtaining a plurality of phonetic transcriptions of the speech signal according to an acoustic model, and the phonetic transcriptions including a plurality of phones;
  
  obtaining a plurality of vocabularies matching the phonetic transcriptions according to each of the phonetic transcriptions and a syllable acoustic lexicon, wherein the syllable acoustic lexicon comprises the vocabularies corresponding to the phonetic transcriptions, and the vocabulary having at least one phonetic transcription comprises each of codes corresponding to each of the phonetic transcriptions;
  
  obtaining a plurality of strings and a plurality of string probabilities from a language model according to the code of each of the vocabularies; and
  
  selecting the string corresponding to a largest one among the string probabilities as a recognition result of the speech signal.
- View Dependent Claims (25, 26, 27, 28, 29, 30, 31, 32)
- - 25. The electronic apparatus of claim 24, wherein the commands further comprise:
    - obtaining the acoustic model through training with the speech signals based on different languages, dialects or different pronunciation habits.
  - 26. The electronic apparatus of claim 25, wherein the command of obtaining the acoustic model through training with the speech signals based on different languages, dialects or different pronunciation habits comprises:
    - receiving the phonetic transcriptions matching pronunciations in the speech signals; and
      
      obtaining data of the phones corresponding to the phonetic transcriptions in the acoustic model by training according to the speech signals and the phonetic transcriptions.
  - 27. The electronic apparatus of claim 24, wherein the command of obtaining the phonetic transcriptions of the speech signal according to the acoustic model comprises:
    - selecting a training data from the acoustic model according to a predetermined setting, wherein the training data is one of training results of different languages, dialects or different pronunciation habits;
      
      calculating a phonetic transcription matching probability of each of the phonetic transcriptions matching the phones according to the selected training data and each of the phones of the speech signal; and
      
      selecting each of the phonetic transcriptions corresponding to a largest one among the phonetic transcription matching probabilities to be used as the phonetic transcriptions of the speech signal.
  - 28. The speech recognition method of claim 24, wherein the step of obtaining the vocabularies matching the phonetic transcription according to each of the phonetic transcriptions and the syllable acoustic lexicon comprises:
    - selecting a pronunciation statistical data from the syllable acoustic lexicon according to a predetermined setting, wherein the pronunciation statistical data is one of different languages, dialects or different pronunciation habits; and
      
      obtaining the phonetic transcriptions from the speech signals, and matching the phonetic transcriptions with the pronunciation statistical data, so as to obtain a fuzzy sound probability of each of the phonetic transcriptions matching each of the vocabularies.
  - 29. The electronic apparatus of claim 28, wherein the commands further comprise:
    - selecting the string corresponding to a largest one among associated probabilities including the fuzzy sound probabilities and the string probabilities as a recognition result of the speech signal.
  - 30. The electronic apparatus of claim 24, wherein the commands further comprise:
    - obtaining the language model through training with a plurality of corpus data based on different languages, dialects or different pronunciation habits.
  - 31. The electronic apparatus of claim 30, wherein the command of obtaining the language model through training with the corpus data based on different languages, dialects or different pronunciation habits comprises:
    - obtaining the strings from the corpus data; and
      
      training the corresponding codes respectively according to the strings and the vocabularies of the strings, so as to obtain the string probabilities of the codes matching each of the strings.
  - 32. The electronic apparatus of claim 30, wherein the command of obtaining the strings and the string probabilities from the language model according to the code of each of the vocabularies comprises:
    - selecting a training data from the corpus data according to a predetermined setting, wherein the training data is one of training results of different languages, dialects or different pronunciation habits.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
VIA Technologies Incorporated (VIA Technologies)
Original Assignee
VIA Technologies Incorporated (VIA Technologies)
Inventors
Zhang, Guo-Feng, Zhu, Yi-Fei

Application Number

US14/490,676
Publication Number

US 20150112674A1
Time in Patent Office

Days
Field of Search
US Class Current

704/235
CPC Class Codes

G10L 15/063   Training

G10L 2015/0633   using lexical or orthograph...

G10L 25/33   using fuzzy logic

METHOD FOR BUILDING ACOUSTIC MODEL, SPEECH RECOGNITION METHOD AND ELECTRONIC APPARATUS

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

32 Claims

Specification

Solutions

Use Cases

Quick Links

METHOD FOR BUILDING ACOUSTIC MODEL, SPEECH RECOGNITION METHOD AND ELECTRONIC APPARATUS

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

32 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links