Apparatus, method, and medium for dialogue speech recognition using topic domain detection

US 8,301,450 B2
Filed: 10/30/2006
Issued: 10/30/2012
Est. Priority Date: 11/02/2005
Status: Expired due to Fees

First Claim

Patent Images

1. An apparatus for dialogue speech recognition using topic domain detection, comprising:

a forward search module to perform a forward search to create a word lattice based on a feature vector, which is extracted from an input voice signal, with reference to a global language model database, a pronunciation dictionary database and an acoustic model database, which have been previously established;

a topic-domain-detection module to detect a topic domain during run-time of a speech recognition procedure from among one or more candidate topic domains, by inferring a topic based on meanings of vocabularies contained in the word lattice using information of the word lattice created as a result of the forward search;

a backward-decoding module to perform a backward decoding relative to the detected topic domain with reference to a specific topic domain language model database, which has been previously established, thereby outputting a speech recognition result for an input voice signal in the form of a text; and

a text-information-management module to store and manage information including information related to the topic domain of the output text which is output by the backward-decoding module, and history information which includes a previous topic domain detected relative to a previous output text obtained as a result of a previous backward decoding of a previous dialogue, andwherein, the topic-domain-detection module further detects the topic domain by determining whether one of the one or more candidate topic domains is the same as a topic domain which is the previous topic domain detected during run-time, using the history information which includes the previous topic domain detected,wherein the topic-domain-detection module includes;

a stop-word-removal module to remove stop words, which are not concerned with the topic, among vocabularies forming the word lattice;

a topic domain distance calculation module, which receives the word lattice, in which the stop words have been removed, to calculate a distance for each of the one or more candidate topic domains based on the vocabularies contained in the word lattice, and receives history information including the previous output text from the text-information-management module to calculate the distance for each of the one or more candidate topic domains, and calculates the distance for each of the one or more candidate topic domains according to a plurality of probability factors,wherein for a first factor, a higher probability weight is given to a candidate topic domain if it is the same as the previous topic domain detected, and a lower probability weight is given to a candidate topic domain if it is different from the previous topic domain detected,wherein for a second factor, a higher probability weight is given to a candidate topic domain in accordance with an increase in a frequency of topic words supporting the candidate topic domain among vocabularies forming the word lattice, andthe first factor and second factor are obtained during run-time of the speech recognition procedure.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An apparatus, method, and medium for dialogue speech recognition using topic domain detection are disclosed. An apparatus includes a forward search module performing a forward search in order to create a word lattice similar to a feature vector, which is extracted from an input voice signal, with reference to a global language model database, a pronunciation dictionary database and an acoustic model database, which have been previously established, a topic-domain-detection module detecting a topic domain by inferring a topic based on meanings of vocabularies contained in the word lattice using information of the word lattice created as a result of the forward search, and a backward-decoding module performing a backward decoding of the detected topic domain with reference to a specific topic domain language model database, which has been previously established, thereby outputting a speech recognition result for an input voice signal in text form. Accuracy and efficiency for a dialogue sentence are improved.

39 Citations

View as Search Results

18 Claims

1. An apparatus for dialogue speech recognition using topic domain detection, comprising:
- a forward search module to perform a forward search to create a word lattice based on a feature vector, which is extracted from an input voice signal, with reference to a global language model database, a pronunciation dictionary database and an acoustic model database, which have been previously established;
  
  a topic-domain-detection module to detect a topic domain during run-time of a speech recognition procedure from among one or more candidate topic domains, by inferring a topic based on meanings of vocabularies contained in the word lattice using information of the word lattice created as a result of the forward search;
  
  a backward-decoding module to perform a backward decoding relative to the detected topic domain with reference to a specific topic domain language model database, which has been previously established, thereby outputting a speech recognition result for an input voice signal in the form of a text; and
  
  a text-information-management module to store and manage information including information related to the topic domain of the output text which is output by the backward-decoding module, and history information which includes a previous topic domain detected relative to a previous output text obtained as a result of a previous backward decoding of a previous dialogue, andwherein, the topic-domain-detection module further detects the topic domain by determining whether one of the one or more candidate topic domains is the same as a topic domain which is the previous topic domain detected during run-time, using the history information which includes the previous topic domain detected,wherein the topic-domain-detection module includes;
  
  a stop-word-removal module to remove stop words, which are not concerned with the topic, among vocabularies forming the word lattice;
  
  a topic domain distance calculation module, which receives the word lattice, in which the stop words have been removed, to calculate a distance for each of the one or more candidate topic domains based on the vocabularies contained in the word lattice, and receives history information including the previous output text from the text-information-management module to calculate the distance for each of the one or more candidate topic domains, and calculates the distance for each of the one or more candidate topic domains according to a plurality of probability factors,wherein for a first factor, a higher probability weight is given to a candidate topic domain if it is the same as the previous topic domain detected, and a lower probability weight is given to a candidate topic domain if it is different from the previous topic domain detected,wherein for a second factor, a higher probability weight is given to a candidate topic domain in accordance with an increase in a frequency of topic words supporting the candidate topic domain among vocabularies forming the word lattice, andthe first factor and second factor are obtained during run-time of the speech recognition procedure.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The apparatus of claim 1, wherein the topic-domain-detection module further includes:
    - a minimum distance detection module to detect a topic domain having a minimum distance among the one or more candidate topic domains having various distances.
  - 3. The apparatus of claim 2, wherein the topic domain distance calculation module calculates the distance for each of the one or more candidate topic domains by using information obtained from the text-information-management module and information obtained from a probability factor database having probability factors used for calculating the distance for each of the one or more candidate topic domains.
  - 4. The apparatus of claim 3, wherein contents of the probability factor database are created by using a training corpus including text information to be spoken, which has been previously established according to topic domains.
  - 5. The apparatus of claim 3, wherein the topic domain distance calculation module calculates the distance for each of the one or more candidate topic domains by using the following equation having probability factors:
  - 6. The apparatus of claim 1, wherein the backward-decoding module further performs a backward sub-decoding with reference to the global language model database, if the text is not output even though the backward decoding has been performed with reference to the specific topic domain language model database.

7. A method of dialogue speech recognition using topic domain detection, comprising:
- performing a forward search to create a word lattice based on a feature vector, which is extracted from an input voice signal, with reference to a global language model database, a pronunciation dictionary database and an acoustic model database, which have been previously established;
  
  detecting a topic domain during run-time of a speech recognition procedure from among one or more candidate topic domains, by inferring a topic based on meanings of vocabularies contained in the word lattice using information of the word lattice created as a result of the forward search; and
  
  performing a backward decoding relative to the detected topic domain with reference to a specific topic domain language model database, which has been previously established, thereby outputting a speech recognition result for an input voice signal in the form of a text,wherein, the detecting a topic domain further comprises determining whether one of the one or more candidate topic domains is the same as a topic domain which is the previous topic domain detected during run-time, relative to a previous output text obtained as a result of a previous backward decoding of a previous dialogue, using history information which includes the previous topic domain detected,wherein detecting the topic domain includes;
  
  removing stop words, which have no concern with the topic, among vocabularies forming the word lattice;
  
  calculating a distance for each of the one or more candidate topic domains based on the vocabularies contained in the word lattice by receiving the word lattice, in which the stop words have been removed,wherein the calculating the distance for each of the one or more candidate topic domains comprises receiving history information including the previous output text, to calculate the distance for each of the one or more candidate topic domains, and calculating the distance for each of the one or more candidate topic domains according to a plurality of probability factors,wherein for a first factor, a higher probability weight is given to a candidate topic domain if it is the same as the previous topic domain detected, and a lower probability weight is given to a candidate topic domain if it is different from the previous topic domain detected,for a second factor, a higher probability weight is given to a candidate topic domain in accordance with an increase in a frequency of topic words supporting the candidate topic domain among vocabularies forming the word lattice, andthe first factor and second factor are obtained during run-time of the speech recognition procedure.
- View Dependent Claims (8, 9, 10, 11, 12, 13, 14)
- - 8. The method of claim 7, wherein detecting the topic domain further includes:
    - detecting a topic domain having a minimum distance among the one or more candidate topic domains having various distances.
  - 9. The method of claim 8, wherein the calculating the distance involves using the history information relative to the previous dialogue for the output text obtained as the result of the backward decoding of the previous dialogue and information obtained from a probability factor database having probability factors used for calculating the distance for each of the one or more candidate topic domains.
  - 10. The method of claim 9, wherein contents of the probability factor database are created using a training corpus including text information to be spoken, which has been previously established according to topic domains.
  - 11. The method of claim 9, wherein calculating the distance further comprises calculating the distance using the equation:
  - 12. The method of claim 9, wherein performing the backward decoding comprises performing a backward sub-decoding with reference to the global language model database, if the text is not output even though the backward decoding has been performed with reference to the specific topic domain language model database.
  - 13. The method of claim 7, further comprising:
    - storing and managing information, including information related to the topic domain of the output text which is output by the backward-decoding, and history information which includes the previous topic domain detected relative to the previous output text obtained as a result of the previous backward decoding of the previous dialogue.
  - 14. At least one non-transitory computer readable medium comprising computer readable instructions implementing the method of claim 7.

15. A method of dialogue speech recognition using topic domain detection, comprising:
- performing a forward search to create a word lattice based on a feature vector, which is extracted from an input voice signal, with reference to at least one previously established database;
  
  detecting a topic domain during run-time of a speech recognition procedure from among one or more candidate topic domains, by inferring a topic based on meanings of vocabularies contained in the word lattice using information of the word lattice created as a result of the forward search; and
  
  performing a backward decoding relative to the detected topic domain with reference to a specific topic domain language model database, which has been previously established, thereby outputting a speech recognition result for an input voice signal in the form of a text,wherein, the detecting a topic domain further comprises determining whether one of the one or more candidate topic domains is the same as a topic domain which is the previous topic domain detected during run time, relative to a previous output text obtained as a result of a previous backward decoding of a previous dialogue, using history information which includes the previous topic domain detected,wherein detecting the topic domain includes;
  
  removing stop words, which have no concern with the topic, among vocabularies forming the word lattice;
  
  calculating a distance for each of the one or more candidate topic domains based on the vocabularies contained in the word lattice by receiving the word lattice, in which the stop words have been removed,wherein the calculating the distance for each of the one or more candidate topic domains comprises receiving history information including the previous output text, to calculate the distance for each of the one or more candidate topic domains, and calculating the distance for each of the one or more candidate topic domains according to a plurality of probability factors,wherein for a first factor, a higher probability weight is given to a candidate topic domain if it is the same as the previous topic domain detected, and a lower probability weight is given to a candidate topic domain if it is different from the previous topic domain detected,for a second factor, a higher probability weight is given to a candidate topic domain in accordance with an increase in a frequency of topic words supporting the candidate topic domain among vocabularies forming the word lattice, andthe first factor and second factor are obtained during run-time of the speech recognition procedure.
- View Dependent Claims (16, 17, 18)
- - 16. The method of claim 15, wherein the at least one previously established database is at least one of a global language model database, a pronunciation dictionary database and an acoustic model database.
  - 17. The method of claim 15, further comprising:
    - storing and managing information, including information related to the topic domain of the output text which is output by the backward-decoding, and history information which includes the previous topic domain detected relative to the previous output text obtained as a result of the previous backward decoding of the previous dialogue.
  - 18. At least one non-transitory computer readable medium comprising computer readable instructions implementing the method of claim 15.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Samsung Electronics Co. Ltd.
Original Assignee
Samsung Electronics Co. Ltd.
Inventors
Lee, Jae-won, Choi, In-jeong
Primary Examiner(s)
YEN, ERIC L

Application Number

US11/589,165
Publication Number

US 20070100618A1
Time in Patent Office

2,192 Days
Field of Search

704/250, 704/251, 704/231, 704/9, 704/257
US Class Current

704/257
CPC Class Codes

G10L 15/1815 Semantic context, e.g. disa...

G10L 15/1822 Parsing for meaning underst...

Apparatus, method, and medium for dialogue speech recognition using topic domain detection

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

39 Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Apparatus, method, and medium for dialogue speech recognition using topic domain detection

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

39 Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links