Domain-based dialog speech recognition method and apparatus

US 20050182628A1
Filed: 02/17/2005
Published: 08/18/2005
Est. Priority Date: 02/18/2004
Status: Abandoned Application

First Claim

Patent Images

1. A domain-based dialog speech recognition method comprising:

performing speech recognition by using a first language model and generating a first recognition result including a plurality of first recognition sentences;

selecting a plurality of candidate domains, by using a word included in each of the first recognition sentences and having a confidence score equal to or higher than a predetermined threshold, as a domain keyword;

performing the speech recognition with the first recognition result, by using an acoustic model specific to each of the candidate domains and a second language model and generating a plurality of second recognition sentences; and

selecting one or more final recognition sentences from the first recognition sentences and the second recognition sentences.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A domain-based speech recognition method and apparatus, the method including: performing speech recognition by using a first language model and generating a first recognition result including a plurality of first recognition sentences; selecting a plurality of candidate domains, by using a word included in each of the first recognition sentences and having a confidence score equal to or higher than a predetermined threshold, as a domain keyword; performing speech recognition with the first recognition result, by using an acoustic model specific to each of the candidate domains and a second language model and generating a plurality of second recognition sentences; and selecting at least one or more final recognition sentence from the first recognition sentences and the second recognition sentences. According to this method and apparatus, the effect of a domain extraction error by misrecognition of a word on selection of a final recognition result can be minimized.

322 Citations

25 Claims

1. A domain-based dialog speech recognition method comprising:
- performing speech recognition by using a first language model and generating a first recognition result including a plurality of first recognition sentences;
  
  selecting a plurality of candidate domains, by using a word included in each of the first recognition sentences and having a confidence score equal to or higher than a predetermined threshold, as a domain keyword;
  
  performing the speech recognition with the first recognition result, by using an acoustic model specific to each of the candidate domains and a second language model and generating a plurality of second recognition sentences; and
  
  selecting one or more final recognition sentences from the first recognition sentences and the second recognition sentences.
- View Dependent Claims (2, 3, 4, 5, 6, 24, 25)
- - 2. The method of claim 1, wherein a global language model is applied as the first language model.
  - 3. The method of claim 1, wherein in the initial stage, a global language is applied as the first language model, and according to a situation of dialog, one of a plurality of generalized language models is selectively applied.
  - 4. The method of claim 1, wherein in selecting the plurality of candidate domains, a classification score of each of the candidate domains is calculated by using keywords each keyword having the confidence score equal to or greater than the predetermined threshold in the plurality of the first recognition sentences, and selecting as the candidate domains, the candidate domains having a classification score equal to or greater than a predetermined threshold.
  - 5. The method of claim 1, wherein in selecting the plurality of candidate domains, if there is no keyword having the confidence score equal to or greater than the predetermined threshold in the plurality of the first recognition sentences, the entire plurality of candidate domains are selected as the candidate domains.
  - 6. The method of claim 1, wherein in generating the plurality of second recognition sentences, speech recognition is performed with any one of word lattices and a word graph among the first recognition result.
  - 24. The method of claim 1, wherein by generating a plurality of high-level recognition sentences including a highest level recognition sentence as result of a first speech recognition process, propagation of errors in a first recognition result is minimized.
  - 25. The method of claim 1, wherein the plurality of candidate domains are extracted based on the words determined in the first and second recognition sentences, a second speech recognition is performed using a language model specific to each of the candidate domains, and a final recognition result is generated from the first and second speech recognition results.

7. A computer-readable recording medium having embodied thereon a computer program sequence for a domain-based dialog speech recognition method comprising:
- performing speech recognition by using a first language model and generating a first recognition result including a plurality of first recognition sentences;
  
  selecting a plurality of candidate domains, by using a word included in each of the first recognition sentences and having a confidence score equal to or higher than a predetermined threshold, as a domain keyword;
  
  performing the speech recognition with the first recognition result, by using an acoustic model specific to each of the candidate domains and a second language model, and generating a plurality of second recognition sentences; and
  
  selecting one or more final recognition sentences from the first recognition sentences and the second recognition sentences.

8. A domain-based dialog speech recognition apparatus comprising:
- a first speech recognition unit which performs speech recognition of input speech by using a first language model and generates a first recognition result including a plurality of first recognition sentences;
  
  a domain extraction unit which selects a plurality of candidate domains by using the plurality of first recognition sentences provided by the first speech recognition unit;
  
  a second speech recognition unit which performs the speech recognition with the first recognition result of the first speech recognition unit, by using an acoustic model specific to each of the candidate domains selected in the domain extraction unit and a second language model and generates a plurality of second recognition sentences; and
  
  a selection unit which selects a plurality of final recognition sentences from the first recognition sentences provided by the first speech recognition unit and the second recognition sentences provided by the second speech recognition unit.
- View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
- - 9. The apparatus of claim 8, wherein in the first speech recognition unit, a global language model is applied as the first language model.
  - 10. The apparatus of claim 8, wherein in the first speech recognition unit, a global language is applied as the first language model in an initial stage, and according to a situation of dialog, one of a plurality of generalized language models is selectively applied.
  - 11. The apparatus of claim 8, wherein the domain extraction unit comprises:
    - a first verification unit which performs word-level confidence score verification for the plurality of the recognition sentences provided by the first speech recognition unit, and extracts verified words each having a confidence score equal to or greater than a predetermined threshold from each of the first recognition sentences;
      
      a domain score calculation unit which selects domain keywords among the verified words provided by the first verification unit with reference to a domain database, and by calculating and adding up domain classification scores of respective keywords, calculates a classification score for each domain; and
      
      a candidate domain selection unit which selects a domain having a classification score equal to or greater than a predetermined threshold among classification scores for respective domains provided by the domain score calculation unit.
  - 12. The apparatus of claim 11, wherein the first verification unit performs word-level confidence score verification of the plurality of the first recognition sentences by using part or all of the plurality of first recognition sentences, word lattices, word graphs obtained by compressing the word lattices, and phoneme strings provided by the first speech recognition unit.
  - 13. The apparatus of claim 8, wherein by using a language model specific to each of the candidate domains and an acoustic model adapted to the language model, the second speech recognition unit recognizes any one of a word lattice and a word graph provided by the first speech recognition unit, and then, by performing rescoring, generates the second recognition sentences.
  - 14. The apparatus of claim 8, wherein the first recognition result generated by the first speech recognition unit includes word lattices, high-level N recognition sentences, word graphs, phoneme strings and syllable strings.
  - 15. The apparatus of claim 8, wherein the first speech recognition unit includes a feature extraction unit, a first search unit, a rescoring unit, and a phoneme unit.
  - 16. The apparatus of claim 15, wherein the feature extraction unit receives a speech signal input, and converts the speech signal input into feature vectors for the speech recognition.
  - 17. The apparatus of claim 16, wherein the first search unit receives the feature vectors from the feature extraction unit, and by using a first acoustic model, a pronunciation dictionary, and a first language model, finds a word string in which the first acoustic model and the first language model match the feature vector string.
  - 18. The apparatus of claim 17, wherein the first acoustic model is a speaker-independent acoustic model or a speaker-adaptive acoustic model adapted to the speech of a user.
  - 19. The apparatus of claim 15, wherein the rescoring unit receives word lattices from the first search unit, applies a first acoustic model and a first language model and outputs the first recognition result.
  - 20. The apparatus of claim 19, wherein the first acoustic model includes a between-words tri-phone model and a quin-phone model and the first language model includes a trigram and a language-dependent rule.
  - 21. The apparatus of claim 8, wherein the second speech recognition unit comprises:
    - a second search unit receiving word lattices or a word graph provided by the first speech recognition unit and searches for N recognition sentences for each of the candidate domains;
      
      a rescoring unit performing rescoring of the N recognition sentences and by using a between-words tri-phone acoustic model or a trigram language model, generates a plurality of rescored recognition sentences;
      
      a verification unit calculating word-level and sentence-level confidence score of the plurality of rescored recognition sentences.
  - 22. The apparatus of claim 21, wherein the trigram language model makes an estimate of a likelihood of a next word based on an identity of two preceding words.
  - 23. The apparatus of claim 21, wherein by limiting a search process to the word lattices or to the word graphs, a computation amount of the second search unit is reduced compared to a first search unit.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Samsung Electronics Co. Ltd.
Original Assignee
Samsung Electronics Co. Ltd.
Inventors
Choi, Injeong

Application Number

US11/059,354
Publication Number

US 20050182628A1
Time in Patent Office

Days
Field of Search
US Class Current

704/252
CPC Class Codes

G10L 15/08   Speech classification or se...

G10L 15/183   using context dependencies,...

G10L 2015/088   Word spotting

Domain-based dialog speech recognition method and apparatus

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

322 Citations

25 Claims

Specification

Use Cases

Quick Links

Others

Domain-based dialog speech recognition method and apparatus

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

322 Citations

25 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others