Apparatus, method, and medium for dialogue speech recognition using topic domain detection

US 20070100618A1
Filed: 10/30/2006
Published: 05/03/2007
Est. Priority Date: 11/02/2005
Status: Active Grant

First Claim

Patent Images

1. An apparatus for dialogue speech recognition using topic domain detection, comprising:

a forward search module performing a forward search in order to create a word lattice similar to a feature vector, which is extracted from an input voice signal, with reference to a global language model database, a pronunciation dictionary database and an acoustic model database, which have been previously established;

a topic-domain-detection module detecting a topic domain by inferring a topic based on meanings of vocabularies contained in the word lattice using information of the word lattice created as a result of the forward search; and

a backward-decoding module performing a backward decoding of the detected topic domain with reference to a specific topic domain language model database, which has been previously established, thereby outputting a speech recognition result for an input voice signal in text form.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An apparatus, method, and medium for dialogue speech recognition using topic domain detection are disclosed. An apparatus includes a forward search module performing a forward search in order to create a word lattice similar to a feature vector, which is extracted from an input voice signal, with reference to a global language model database, a pronunciation dictionary database and an acoustic model database, which have been previously established, a topic-domain-detection module detecting a topic domain by inferring a topic based on meanings of vocabularies contained in the word lattice using information of the word lattice created as a result of the forward search, and a backward-decoding module performing a backward decoding of the detected topic domain with reference to a specific topic domain language model database, which has been previously established, thereby outputting a speech recognition result for an input voice signal in text form. Accuracy and efficiency for a dialogue sentence are improved.

101 Citations

View as Search Results

17 Claims

1. An apparatus for dialogue speech recognition using topic domain detection, comprising:
- a forward search module performing a forward search in order to create a word lattice similar to a feature vector, which is extracted from an input voice signal, with reference to a global language model database, a pronunciation dictionary database and an acoustic model database, which have been previously established;
  
  a topic-domain-detection module detecting a topic domain by inferring a topic based on meanings of vocabularies contained in the word lattice using information of the word lattice created as a result of the forward search; and
  
  a backward-decoding module performing a backward decoding of the detected topic domain with reference to a specific topic domain language model database, which has been previously established, thereby outputting a speech recognition result for an input voice signal in text form.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The apparatus of claim 1, further comprising a text-information-management module storing and managing various types of information including information related to the topic domain of the text, which is output by the backward-decoding module, and history information of the text.
  - 3. The apparatus of claim 2, wherein the topic-domain-detection module includes a stop-word-removal module removing stop words, which are not concerned with the topic, among vocabularies forming the word lattice;
    - a topic domain distance calculation module, which receives the word lattice, in which the stop words have been removed, so as to calculate a distance from each topic domain based on the vocabularies contained in the word lattice; and
      
      a minimum distance detection module detecting a topic domain having a minimum distance among topic domains having various distances.
  - 4. The apparatus of claim 3, wherein the topic domain distance calculation module calculates the distance between the topic domains by using information obtained from the text-information-management module and information obtained from a probability factor database having probability factors used for calculating the distance from each topic domain.
  - 5. The apparatus of claim 4, wherein contents of the probability factor database are created by using a training corpus including text information to be spoken, which has been previously established according to topic domains.
  - 6. The apparatus of claim 4, wherein the topic domain distance calculation module calculates the distance from each topic domain by using the following equation having probability factors:
    - $\Pr (D_{i} | w_{1} \dots w_{n}) ≅ \prod_{j = 1}^{n} \Pr (w_{j} | D_{i}) \cdot (1 / D F_{wj}) \cdot w_{domain} \cdot ({WF}_{D i} / n)$ wherein, Pr(D_i|w₁. . . w_n) is a probability of selecting an i^thtopic domain based on n vocabularies, Pr(w_j|D_i) is a probability of selecting a j^thtopic word w_jin a state in which the i^thtopic domain (D_i) has been selected, DF_wjis a topic domain frequency, indicating a number of topic domains related to the topic word w_j, w_domainis a context weight factor, and WF_Diis a topic word frequency representing a number of topic words supporting the i^thtopic domain (D_i).
  - 7. The apparatus of claim 2, wherein the backward-decoding module further performs a backward sub-decoding with reference to the global language model database, if the text is not output even though the backward decoding has been performed with reference to the specific topic domain language model database.

8. A method of dialogue speech recognition using topic domain detection, comprising:
- (a) performing a forward search in order to create a word lattice similar to a feature vector, which is extracted from an input voice signal, with reference to a global language model database, a pronunciation dictionary database and an acoustic model database, which have been previously established;
  
  (b) detecting a topic domain by inferring a topic based on meanings of vocabularies contained in the word lattice using information of the word lattice created as a result of the forward search; and
  
  (c) performing a backward decoding of the detected topic domain with reference to a specific topic domain language model database, which has been previously established, thereby outputting a speech recognition result for an input voice signal in text form.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The method of claim 8, wherein (b) includes:
    - (b1) removing stop words, which have no concern with the topic, among vocabularies forming the word lattice;
      
      (b2) calculating a distance from each topic domain based on the vocabularies contained in the word lattice by receiving the word lattice, in which the stop words have been removed; and
      
      (b3) detecting a topic domain having a minimum distance among topic domains having various distances.
  - 10. The method of claim 9, wherein (b2) involves calculating the distance using information obtained from the text output as a result of the backward decoding and information obtained from a probability factor database having probability factors used for calculating the distance from each topic domain.
  - 11. The method of claim 10, wherein contents of the probability factor database are created using a training corpus including text information to be spoken, which has been previously established according to topic domains.
  - 12. The method of claim 10, wherein (b2) involves calculating the distance using the equation:
    - $\Pr (D_{i} | w_{1} \dots w_{n}) ≅ \prod_{j = 1}^{n} \Pr (w_{j} | D_{i}) \cdot (1 / D F_{wj}) \cdot w_{domain} \cdot ({WF}_{D i} / n)$ wherein, Pr(D_i|w₁. . . w_n) is a probability of selecting an i^thtopic domain based on n vocabularies, Pr(w_j|D_i) is a probability of selecting a j^thtopic word w_jin a state in which the i^thtopic domain (D_i) has been selected, DF_wjis a topic domain frequency, indicating a number of topic domains related to the topic word w_j, w_domainis a context weight factor, and WF_Diis a topic word frequency representing a number of topic words supporting the i^thtopic domain (D_i).
  - 13. The method of claim 10, wherein (c) involves performing a backward sub-decoding with reference to the global language model database, if the text is not output even though the backward decoding has been performed with reference to the specific topic domain language model database.
  - 14. At least one computer readable medium comprising computer readable instructions implementing the method of claim 8.

15. A method of dialogue speech recognition using topic domain detection, comprising:
- (a) performing a forward search in order to create a word lattice similar to a feature vector, which is extracted from an input voice signal, with reference to at least one previously established database;
  
  (b) detecting a topic domain by inferring a topic based on meanings of vocabularies contained in the word lattice using information of the word lattice created as a result of the forward search; and
  
  (c) performing a backward decoding of the detected topic domain with reference to a specific topic domain language model database, which has been previously established, thereby outputting a speech recognition result for an input voice signal in text form.
- View Dependent Claims (16, 17)
- - 16. The method of claim 15, wherein the at least one previously established database is at least one of a global language model database, a pronunciation dictionary database and an acoustic model database.
  - 17. At least one computer readable medium comprising computer readable instructions implementing the method of claim 15.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Samsung Electronics Co. Ltd.
Original Assignee
Samsung Electronics Co. Ltd.
Inventors
Lee, Jae-won, Choi, In-Jeong

Granted Patent

US 8,301,450 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/238
CPC Class Codes

G10L 15/1815 Semantic context, e.g. disa...

G10L 15/1822 Parsing for meaning underst...

Apparatus, method, and medium for dialogue speech recognition using topic domain detection

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

101 Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Apparatus, method, and medium for dialogue speech recognition using topic domain detection

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

101 Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links