Methods and apparatus for context adaptation of speech-to-speech translation systems
First Claim
1. A method of context adaptation of a speech-to-speech translation system comprising the steps of:
- extracting a plurality of sets of paralinguistic attribute values from a plurality of input signals, wherein each set of the plurality of sets of paralinguistic attribute values is extracted from a corresponding input signal of the plurality of input signals via a corresponding classifier of a plurality of classifiers;
generating a final set of paralinguistic attribute values for the plurality of input signals from the plurality of sets of paralinguistic attribute values; and
modifying performance of at least one of a speech recognition module, a translation module and a text-to-speech module of the speech-to-speech translation system in accordance with the final set of paralinguistic attribute values for the plurality of input signals;
wherein the set of paralinguistic attribute values that each classifier extracts is represented by a vector signal output by the classifier, the vector signal comprising two or more values corresponding to two or more paralinguistic attributes of interest such that the step of generating the final set of paralinguistic attribute values comprises combining each of the vector signals from each of the classifiers by combining values of common paralinguistic attributes of interest across the vector signals to yield a separate decision value for each of the two or more paralinguistic attributes of interest, the final set of paralinguistic attribute values comprising a plurality of decision values corresponding to respective ones of the two or more paralinguistic attributes of interest;
further wherein the extracting, generating and modifying steps are implemented via instruction code that is executed by at least one processor device.
1 Assignment
0 Petitions
Accused Products
Abstract
A technique for context adaptation of a speech-to-speech translation system is provided. A plurality of sets of paralinguistic attribute values is obtained from a plurality of input signals. Each set of the plurality of sets of paralinguistic attribute values is extracted from a corresponding input signal of the plurality of input signals via a corresponding classifier of a plurality of classifiers. A final set of paralinguistic attribute values is generated for the plurality of input signals from the plurality of sets of paralinguistic attribute values. Performance of at least one of a speech recognition module, a translation module and a text-to-speech module of the speech-to-speech translation system is modified in accordance with the final set of paralinguistic attribute values for the plurality of input signals.
37 Citations
20 Claims
-
1. A method of context adaptation of a speech-to-speech translation system comprising the steps of:
-
extracting a plurality of sets of paralinguistic attribute values from a plurality of input signals, wherein each set of the plurality of sets of paralinguistic attribute values is extracted from a corresponding input signal of the plurality of input signals via a corresponding classifier of a plurality of classifiers; generating a final set of paralinguistic attribute values for the plurality of input signals from the plurality of sets of paralinguistic attribute values; and modifying performance of at least one of a speech recognition module, a translation module and a text-to-speech module of the speech-to-speech translation system in accordance with the final set of paralinguistic attribute values for the plurality of input signals; wherein the set of paralinguistic attribute values that each classifier extracts is represented by a vector signal output by the classifier, the vector signal comprising two or more values corresponding to two or more paralinguistic attributes of interest such that the step of generating the final set of paralinguistic attribute values comprises combining each of the vector signals from each of the classifiers by combining values of common paralinguistic attributes of interest across the vector signals to yield a separate decision value for each of the two or more paralinguistic attributes of interest, the final set of paralinguistic attribute values comprising a plurality of decision values corresponding to respective ones of the two or more paralinguistic attributes of interest; further wherein the extracting, generating and modifying steps are implemented via instruction code that is executed by at least one processor device. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A context adaptable speech-to-speech translation system comprising:
-
a memory at least one processor implementing; a plurality of classifiers, wherein each of the plurality of classifiers receives a corresponding input signal and generates a corresponding set of paralinguistic attribute values; a fusion module that receives a plurality of sets of paralinguistic attribute values from the plurality of classifiers and generates a final set of paralinguistic attribute values; and speech-to-speech translation modules comprising a speech recognition module, a translation module, and a text-to-speech module, wherein performance of at least one of the speech recognition module, the translation module and the text-to-speech module are modified in accordance with the final set of paralinguistic attribute values for the plurality of input signals; wherein the set of paralinguistic attribute values that each classifier generates is represented by a vector signal output by the classifier, the vector signal comprising two or more values corresponding to two or more paralinguistic attributes of interest such that the step of generating the final set of paralinguistic attribute values performed by the fusion module comprises combining each of the vector signals from each of the classifiers by combining values of common paralinguistic attributes of interest across the vector signals to yield a separate decision value for each of the two or more paralinguistic attributes of interest, the final set of paralinguistic attribute values comprising a plurality of decision values corresponding to respective ones of the two or more paralinguistic attributes of interest. - View Dependent Claims (15, 16, 17, 18, 19)
-
-
20. An article of manufacture for context adaptation of a speech-to-speech translation system, comprising a non-transitory machine readable storage medium containing one or more programs which when executed by at least one processor device implement the steps of:
-
extracting a plurality of sets of paralinguistic attribute values from a plurality of input signals, wherein each set of the plurality of sets of paralinguistic attribute values is extracted from a corresponding input signal of the plurality of input signals via a corresponding classifier of a plurality of classifiers; generating a final set of paralinguistic attribute values for the plurality of input signals from the plurality of sets of paralinguistic attribute values; and modifying performance of at least one of a speech recognition module, a translation module and a text-to-speech module of the speech-to-speech translation system in accordance with the final set of paralinguistic attribute values for the plurality of input signals; wherein the set of paralinguistic attribute values that each classifier extracts is represented by a vector signal output by the classifier, the vector signal comprising two or more values corresponding to two or more paralinguistic attributes of interest such that the step of generating the final set of paralinguistic attribute values comprises combining each of the vector signals from each of the classifiers by combining values of common paralinguistic attributes of interest across the vector signals to yield a separate decision value for each of the two or more paralinguistic attributes of interest, the final set of paralinguistic attribute values comprising a plurality of decision values corresponding to respective ones of the two or more paralinguistic attributes of interest.
-
Specification