Methods and apparatus for context adaptation of speech-to-speech translation systems

US 7,860,705 B2
Filed: 09/01/2006
Issued: 12/28/2010
Est. Priority Date: 09/01/2006
Status: Expired due to Fees

First Claim

Patent Images

1. A method of context adaptation of a speech-to-speech translation system comprising the steps of:

extracting a plurality of sets of paralinguistic attribute values from a plurality of input signals, wherein each set of the plurality of sets of paralinguistic attribute values is extracted from a corresponding input signal of the plurality of input signals via a corresponding classifier of a plurality of classifiers;

generating a final set of paralinguistic attribute values for the plurality of input signals from the plurality of sets of paralinguistic attribute values; and

modifying performance of at least one of a speech recognition module, a translation module and a text-to-speech module of the speech-to-speech translation system in accordance with the final set of paralinguistic attribute values for the plurality of input signals;

wherein the set of paralinguistic attribute values that each classifier extracts is represented by a vector signal output by the classifier, the vector signal comprising two or more values corresponding to two or more paralinguistic attributes of interest such that the step of generating the final set of paralinguistic attribute values comprises combining each of the vector signals from each of the classifiers by combining values of common paralinguistic attributes of interest across the vector signals to yield a separate decision value for each of the two or more paralinguistic attributes of interest, the final set of paralinguistic attribute values comprising a plurality of decision values corresponding to respective ones of the two or more paralinguistic attributes of interest;

further wherein the extracting, generating and modifying steps are implemented via instruction code that is executed by at least one processor device.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A technique for context adaptation of a speech-to-speech translation system is provided. A plurality of sets of paralinguistic attribute values is obtained from a plurality of input signals. Each set of the plurality of sets of paralinguistic attribute values is extracted from a corresponding input signal of the plurality of input signals via a corresponding classifier of a plurality of classifiers. A final set of paralinguistic attribute values is generated for the plurality of input signals from the plurality of sets of paralinguistic attribute values. Performance of at least one of a speech recognition module, a translation module and a text-to-speech module of the speech-to-speech translation system is modified in accordance with the final set of paralinguistic attribute values for the plurality of input signals.

37 Citations

View as Search Results

20 Claims

1. A method of context adaptation of a speech-to-speech translation system comprising the steps of:
- extracting a plurality of sets of paralinguistic attribute values from a plurality of input signals, wherein each set of the plurality of sets of paralinguistic attribute values is extracted from a corresponding input signal of the plurality of input signals via a corresponding classifier of a plurality of classifiers;
  
  generating a final set of paralinguistic attribute values for the plurality of input signals from the plurality of sets of paralinguistic attribute values; and
  
  modifying performance of at least one of a speech recognition module, a translation module and a text-to-speech module of the speech-to-speech translation system in accordance with the final set of paralinguistic attribute values for the plurality of input signals;
  
  wherein the set of paralinguistic attribute values that each classifier extracts is represented by a vector signal output by the classifier, the vector signal comprising two or more values corresponding to two or more paralinguistic attributes of interest such that the step of generating the final set of paralinguistic attribute values comprises combining each of the vector signals from each of the classifiers by combining values of common paralinguistic attributes of interest across the vector signals to yield a separate decision value for each of the two or more paralinguistic attributes of interest, the final set of paralinguistic attribute values comprising a plurality of decision values corresponding to respective ones of the two or more paralinguistic attributes of interest;
  
  further wherein the extracting, generating and modifying steps are implemented via instruction code that is executed by at least one processor device.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The method of claim 1, wherein the step of extracting a plurality of sets of paralinguistic attribute values comprises the steps of:
    - receiving each of the plurality of input signals at a corresponding one of the plurality of classifiers;
      
      determining a value for each of a plurality of paralinguistic attributes from each input signal;
      
      outputting a set of paralinguistic attribute values from each of the plurality of classifiers.
  - 3. The method of claim 1, wherein, in the step of extracting a plurality of sets of paralinguistic attribute values, the paralinguistic attribute values comprise values for at least one of gender, accent, age, intonation, emotion, social background and educational level of a speaker.
  - 4. The method of claim 1, wherein, in the step of extracting a plurality of sets of paralinguistic attribute values, the plurality of classifiers comprise at least one of a speech signal classifier, a visual signal classifier, a text input classifier, and a pointing device classifier.
  - 5. The method of claim 4, wherein an input signal of the text input classifier comprises at least one of text entered by an operator and text obtained as feedback from the speech recognition module.
  - 6. The method of claim 1, wherein, in the step of combining values of common paralinguistic attributes, each set of paralinguistic attribute values comprise a plurality of confidence values, and each of the plurality of confidence values corresponds to a paralinguistic attribute value, and wherein the plurality of confidence values are utilized in combining values.
  - 7. The method of claim 1, wherein, in the step of combining values of common paralinguistic attributes, each set of paralinguistic attribute values is associated with a corresponding usefulness factor for each classifier, and the usefulness factor is utilized in combining values.
  - 8. The method of claim 1, wherein, in the step of generating a final set of paralinguistic attribute values, the final set of paralinguistic attribute values define a social context of the plurality of input signals.
  - 9. The method of claim 1, wherein, in the step of generating a final set of paralinguistic attribute values, the final set of paralinguistic attribute values enable question detection.
  - 10. The method of claim 1, wherein the step of modifying performance comprises the steps of:
    - constructing one or more models in at least one of the speech recognition module and the translation module in accordance with the final set of paralinguistic attribute values, wherein each model is conditioned on different paralinguistic attributes; and
      
      dynamically selecting an appropriate model from the one or more models during operation of at least one of the speech recognition module and the translation module.
  - 11. The method of claim 1, wherein the step of modifying performance comprises the step of accessing an expression database to generate appropriate expression in the text-to-speech module based on the final set of paralinguistic attribute values.
  - 12. The method of claim 1, wherein the step of modifying performance comprises the step of obtaining an appropriate pronunciation in the text-to-speech module based on the final set of paralinguistic attribute values.
  - 13. The method of claim 1, wherein the plurality of decision values comprise values for at least two of gender, accent, age, intonation, emotion, social background and educational level of a speaker.

14. A context adaptable speech-to-speech translation system comprising:
- a memoryat least one processor implementing;
  
  a plurality of classifiers, wherein each of the plurality of classifiers receives a corresponding input signal and generates a corresponding set of paralinguistic attribute values;
  
  a fusion module that receives a plurality of sets of paralinguistic attribute values from the plurality of classifiers and generates a final set of paralinguistic attribute values; and
  
  speech-to-speech translation modules comprising a speech recognition module, a translation module, and a text-to-speech module, wherein performance of at least one of the speech recognition module, the translation module and the text-to-speech module are modified in accordance with the final set of paralinguistic attribute values for the plurality of input signals;
  
  wherein the set of paralinguistic attribute values that each classifier generates is represented by a vector signal output by the classifier, the vector signal comprising two or more values corresponding to two or more paralinguistic attributes of interest such that the step of generating the final set of paralinguistic attribute values performed by the fusion module comprises combining each of the vector signals from each of the classifiers by combining values of common paralinguistic attributes of interest across the vector signals to yield a separate decision value for each of the two or more paralinguistic attributes of interest, the final set of paralinguistic attribute values comprising a plurality of decision values corresponding to respective ones of the two or more paralinguistic attributes of interest.
- View Dependent Claims (15, 16, 17, 18, 19)
- - 15. The context adaptable speech-to-speech translation system of claim 14, wherein the each of the plurality of classifiers receive a corresponding one of the plurality of input signals, determine a value for each of a plurality of paralinguistic attributes from each input signal, and output a set of paralinguistic attribute values from each of the plurality of classifiers.
  - 16. The context adaptable speech-to-speech translation system of claim 14, wherein the speech-to-speech translation modules construct one or more models in at least one of the speech recognition module and the translation module in accordance with the final set of paralinguistic attribute values, wherein each model is conditioned on different paralinguistic attributes, and dynamically select an appropriate model from the one or more models during operation of at least one of the speech recognition module and the translation module.
  - 17. The context adaptable speech-to-speech translation system of claim 14, wherein the speech-to-speech translation modules access an expression database to generate appropriate expression in the text-to-speech module based on the final set of paralinguistic attribute values.
  - 18. The context adaptable speech-to-speech translation system of claim 14, wherein the speech-to-speech translation modules obtain an appropriate pronunciation in the text-to-speech module based on the final set of paralinguistic attribute values.
  - 19. The context adaptable speech-to-speech translation system of claim 14, wherein the plurality of decision values comprise values for at least two of gender, accent, age, intonation, emotion, social background and educational level of a speaker.

20. An article of manufacture for context adaptation of a speech-to-speech translation system, comprising a non-transitory machine readable storage medium containing one or more programs which when executed by at least one processor device implement the steps of:
- extracting a plurality of sets of paralinguistic attribute values from a plurality of input signals, wherein each set of the plurality of sets of paralinguistic attribute values is extracted from a corresponding input signal of the plurality of input signals via a corresponding classifier of a plurality of classifiers;
  
  generating a final set of paralinguistic attribute values for the plurality of input signals from the plurality of sets of paralinguistic attribute values; and
  
  modifying performance of at least one of a speech recognition module, a translation module and a text-to-speech module of the speech-to-speech translation system in accordance with the final set of paralinguistic attribute values for the plurality of input signals;
  
  wherein the set of paralinguistic attribute values that each classifier extracts is represented by a vector signal output by the classifier, the vector signal comprising two or more values corresponding to two or more paralinguistic attributes of interest such that the step of generating the final set of paralinguistic attribute values comprises combining each of the vector signals from each of the classifiers by combining values of common paralinguistic attributes of interest across the vector signals to yield a separate decision value for each of the two or more paralinguistic attributes of interest, the final set of paralinguistic attribute values comprising a plurality of decision values corresponding to respective ones of the two or more paralinguistic attributes of interest.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Afify, Mohamed A., Gu, Liang, Kuo, Hong-Kwang Jeff, Zhou, Bowen, Gao, Yuqing
Primary Examiner(s)
Dorvil; Richemond
Assistant Examiner(s)
GODBOLD, DOUGLAS

Application Number

US11/514,604
Publication Number

US 20080059147A1
Time in Patent Office

1,579 Days
Field of Search

704 2- 8, 704/277
US Class Current

704/3
CPC Class Codes

G06Q 30/02 Marketing; Price estimation...

G10L 15/22 Procedures used during a sp...

Methods and apparatus for context adaptation of speech-to-speech translation systems

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

37 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Methods and apparatus for context adaptation of speech-to-speech translation systems

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

37 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links