SPEECH TRANSLATION APPARATUS, METHOD AND PROGRAM
First Claim
1. A speech translation apparatus comprising:
- a receiving unit configured to receive a speech in a first language and convert the speech to a speech signal;
a first recognition unit configured to perform a speech recognition of the speech signal and generate a transcription of the speech signal, the transcription including one or more raw words or one or more raw phrases in the first language, the raw words and the raw phrases being affected by an speaker'"'"'s emotion;
a second recognition unit configured to recognize which emotion type is included in the speech using at least one of the transcription and the speech signal and to generate emotion identification information item including at least one recognized emotion type;
a first generation unit configured to generate a filtered sentence by transforming the one or more raw words or the one or more raw phrases into one or more filtered words or one or more filtered phrases in the first language, referring to a first model that the raw words and the raw phrases correspond to the filtered words and the filtered phrases, the filtered words and the filtered phrases failing to be affected by the speaker'"'"'s emotion;
a translation unit configured to generate a translation of the filtered sentence in the first language into a second language which is different from the first language;
a second generation unit configured to generate an insertion sentence explaining the recognized emotion type in the second language; and
a synthesis unit configured to convert the filtered sentence and the insertion sentence into a speech signal.
1 Assignment
0 Petitions
Accused Products
Abstract
According to one embodiment, a speech translation apparatus includes a receiving unit, a first recognition unit, a second recognition unit, a first generation unit, a translation unit, a second generation unit, a synthesis unit. The receiving unit is configured to receive a speech in a first language and convert to speech signal. The first recognition unit is configured to perform speech recognition and generate a transcription. The second recognition unit is configured to recognize which emotion type is included in the speech and generate emotion identification information including recognized emotion type(s). The first generation unit is configured to generate a filtered sentence. The translation unit is configured to generate a translation of the filtered sentence in the first language in a second language. The second generation unit is configured to generate an insertion sentence. The synthesis unit is configured to convert the filtered and the insertion sentences into speech signal.
-
Citations
12 Claims
-
1. A speech translation apparatus comprising:
-
a receiving unit configured to receive a speech in a first language and convert the speech to a speech signal; a first recognition unit configured to perform a speech recognition of the speech signal and generate a transcription of the speech signal, the transcription including one or more raw words or one or more raw phrases in the first language, the raw words and the raw phrases being affected by an speaker'"'"'s emotion; a second recognition unit configured to recognize which emotion type is included in the speech using at least one of the transcription and the speech signal and to generate emotion identification information item including at least one recognized emotion type; a first generation unit configured to generate a filtered sentence by transforming the one or more raw words or the one or more raw phrases into one or more filtered words or one or more filtered phrases in the first language, referring to a first model that the raw words and the raw phrases correspond to the filtered words and the filtered phrases, the filtered words and the filtered phrases failing to be affected by the speaker'"'"'s emotion; a translation unit configured to generate a translation of the filtered sentence in the first language into a second language which is different from the first language; a second generation unit configured to generate an insertion sentence explaining the recognized emotion type in the second language; and a synthesis unit configured to convert the filtered sentence and the insertion sentence into a speech signal. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A speech translation method comprising:
-
receiving a speech in a first language and converting the speech to a speech signal; performing a speech recognition of the speech signal and generating a transcription of the speech signal, the transcription including one or more raw words or one or more raw phrases in the first language, the raw words and the raw phrases being affected by an speaker'"'"'s emotion; recognizing which emotion type is included in the speech using at least one of the transcription and speech signal and generating emotion identification information item including at least one recognized emotion type; generating a filtered sentence by transforming the one or more raw words or the one or more raw phrases into one or more filtered words or one or more filtered phrases in the first language, referring to a first model that the raw words and the raw phrases correspond to the filtered words and the filtered phrases, the filtered words and the filtered phrases failing to be affected by the speaker'"'"'s emotion; generating a translation of the filtered sentence in the first language into a second language which is different from the first language; generating an insertion sentence explaining the recognized emotion type in the second language; and converting the filtered sentence and the insertion sentence into a speech signal.
-
-
12. A non-transitory computer readable medium including computer executable instructions, wherein the instructions, when executed by a processor, cause the processor to perform a method comprising:
-
receiving a speech in a first language and converting the speech to a speech signal; performing a speech recognition of the speech signal and generating a transcription of the speech signal, the transcription including one or more raw words or one or more raw phrases in the first language, the raw words and the raw phrases being affected by an speaker'"'"'s emotion; recognizing which emotion type is included in the speech using at least one of the transcription and speech signal and generating emotion identification information item including at least one recognized emotion type; generating a filtered sentence by transforming the one or more raw words or the one or more raw phrases into one or more filtered words or one or more filtered phrases in the first language, referring to a first model that the raw words and the raw phrases correspond to the filtered words and the filtered phrases, the filtered words and the filtered phrases failing to be affected by the speaker'"'"'s emotion; generating a translation of the filtered sentence in the first language into a second language which is different from the first language; generating an insertion sentence explaining the recognized emotion type in the second language; and converting the filtered sentence and the insertion sentence into a speech signal.
-
Specification