System and method for enriching spoken language translation with prosodic information
First Claim
1. A method comprising:
- receiving speech for translation to a target language;
prior to a translation of the speech, generating, via a processor and via a discriminative classifier model, a pitch accent label based on the speech independent of volume, the pitch accent label having a regional accent type and representing segments of the speech which are prosodically prominent; and
injecting the pitch accent label with a word token within a translation engine to create target language output text.
3 Assignments
0 Petitions
Accused Products
Abstract
Disclosed herein are systems, methods, and computer readable-media for enriching spoken language translation with prosodic information in a statistical speech translation framework. The method includes receiving speech for translation to a target language, generating pitch accent labels representing segments of the received speech which are prosodically prominent, and injecting pitch accent labels with word tokens within the translation engine to create enriched target language output text. A further step may be added of synthesizing speech in the target language based on the prosody enriched target language output text. An automatic prosody labeler can generate pitch accent labels. An automatic prosody labeler can exploit lexical, syntactic, and prosodic information of the speech. A maximum entropy model may be used to determine which segments of the speech are prosodically prominent. A pitch accent label can include an indication of certainty that a respective segment of the speech is prosodically prominent and/or an indication of prosodic prominence of a respective segment of speech.
-
Citations
17 Claims
-
1. A method comprising:
-
receiving speech for translation to a target language; prior to a translation of the speech, generating, via a processor and via a discriminative classifier model, a pitch accent label based on the speech independent of volume, the pitch accent label having a regional accent type and representing segments of the speech which are prosodically prominent; and injecting the pitch accent label with a word token within a translation engine to create target language output text. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A system comprising:
-
a processor; and a computer-readable medium having instructions stored which, when executed by the processor, cause the processor to perform operations comprising; receiving speech for translation to a target language; prior to a translation of the speech, generating, via the processor and via a discriminative classifier model, a pitch accent label based on the speech independent of volume, the pitch accent label having a regional accent type and representing segments of the speech which are prosodically prominent; and injecting the pitch accent label with a word token within a translation engine to create target language output text. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A computer-readable storage device having instructions stored which, when executed by a processor, cause the processor to perform operations comprising:
-
receiving speech for translation to a target language; prior to a translation of the speech, generating, via the processor and via a discriminative classifier model, a pitch accent label based on the speech independent of volume, the pitch accent label having a regional accent type and representing segments of the speech which are prosodically prominent; and injecting the pitch accent label with a word token within a translation engine to create target language output text. - View Dependent Claims (14, 15, 16, 17)
-
Specification