Speech translation method and apparatus utilizing prosodic information
First Claim
1. A method, comprising:
- receiving source speech with a computer system having a processing device; and
using said processing device, performing steps of;
electronically marking a relative position of and extracting non-text information in the source speech, said non-text information comprising both;
(a) at least one emotional utterance lacking a word; and
(b) prosodic information of at least one speech unit in the source speech comprising at least one of emphasis, intonation, duration, fundamental frequency, pitch and energy;
obtaining a relative value of the prosodic information of the at least one speech unit in the source speech by comparing the prosodic information of the at least one speech unit in the source speech with reference prosodic information for the at least one speech unit, the reference prosodic information representing prosodic information of a sample speech unit under defined conditions;
translating the source speech into target speech; and
adjusting the target speech based on the at least one emotional utterance lacking a word and the prosodic information to preserve the non-text information in the source speech, including adjusting prosodic information of at least one speech unit of the target speech based on the relative value of the prosodic information of the at least one speech unit of the source speech,wherein adjusting the target speech based on the at least one emotional utterance lacking a word and the prosodic information comprises adding a corresponding emotional utterance lacking a word to the target speech at a location of the target speech corresponding to the relative position of the at least one emotional utterance lacking a word in the source speech.
3 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus for speech translation. The method includes: receiving a source speech; extracting non-text information in the source speech; translating the source speech into a target speech; and adjusting the translated target speech according to the extracted non-text information so that the target speech preserves the non-text information in the source speech. The apparatus includes: a receiving module for receiving source speech; an extracting module for extracting non-text information in the source speech; a translation module for translating the source speech into a target speech; and an adjusting module for adjusting the translated target speech according to the extracted non-text information so that the target speech preserves the non-text information in the source speech.
81 Citations
13 Claims
-
1. A method, comprising:
-
receiving source speech with a computer system having a processing device; and using said processing device, performing steps of; electronically marking a relative position of and extracting non-text information in the source speech, said non-text information comprising both; (a) at least one emotional utterance lacking a word; and (b) prosodic information of at least one speech unit in the source speech comprising at least one of emphasis, intonation, duration, fundamental frequency, pitch and energy; obtaining a relative value of the prosodic information of the at least one speech unit in the source speech by comparing the prosodic information of the at least one speech unit in the source speech with reference prosodic information for the at least one speech unit, the reference prosodic information representing prosodic information of a sample speech unit under defined conditions; translating the source speech into target speech; and adjusting the target speech based on the at least one emotional utterance lacking a word and the prosodic information to preserve the non-text information in the source speech, including adjusting prosodic information of at least one speech unit of the target speech based on the relative value of the prosodic information of the at least one speech unit of the source speech, wherein adjusting the target speech based on the at least one emotional utterance lacking a word and the prosodic information comprises adding a corresponding emotional utterance lacking a word to the target speech at a location of the target speech corresponding to the relative position of the at least one emotional utterance lacking a word in the source speech. - View Dependent Claims (2, 3, 4, 5, 12)
-
-
6. An apparatus, comprising:
-
a receiver configured to receive source speech; and at least one computer processing device configured to; electronically mark a relative position of and extract non-text information in the source speech, said non-text information comprising both; (a) at least one emotional utterance lacking a word; and (b) prosodic information of at least one speech unit in the source speech comprising at least one of emphasis, intonation, duration, fundamental frequency, pitch and energy; obtain a relative value of the prosodic information of the at least one speech unit in the source speech by comparing the prosodic information of the at least one speech unit in the source speech with reference prosodic information for the at least one speech unit, the reference prosodic information representing prosodic information of a sample speech unit under defined conditions; translate the source speech into target speech; and adjust the target speech based on the at least one emotional utterance lacking a word and the prosodic information to preserve the non-text information in the source speech, including adjusting prosodic information of at least one speech unit of the target speech based on the relative value of the prosodic information of the at least one speech unit of the source speech, wherein adjusting the target speech based on the at least one emotional utterance lacking a word and the prosodic information comprises adding a corresponding emotional utterance lacking a word to the target speech at a location of the target speech corresponding to the relative position of the at least one emotional utterance lacking a word in the source speech. - View Dependent Claims (7, 8, 9, 10, 11, 13)
-
Specification