SPEECH TRANSLATION APPARATUS, METHOD AND PROGRAM

US 20120078607A1
Filed: 03/25/2011
Published: 03/29/2012
Est. Priority Date: 09/29/2010
Status: Active Grant

First Claim

Patent Images

1. A speech translation apparatus comprising:

a receiving unit configured to receive a speech in a first language and convert the speech to a speech signal;

a first recognition unit configured to perform a speech recognition of the speech signal and generate a transcription of the speech signal, the transcription including one or more raw words or one or more raw phrases in the first language, the raw words and the raw phrases being affected by an speaker'"'"'s emotion;

a second recognition unit configured to recognize which emotion type is included in the speech using at least one of the transcription and the speech signal and to generate emotion identification information item including at least one recognized emotion type;

a first generation unit configured to generate a filtered sentence by transforming the one or more raw words or the one or more raw phrases into one or more filtered words or one or more filtered phrases in the first language, referring to a first model that the raw words and the raw phrases correspond to the filtered words and the filtered phrases, the filtered words and the filtered phrases failing to be affected by the speaker'"'"'s emotion;

a translation unit configured to generate a translation of the filtered sentence in the first language into a second language which is different from the first language;

a second generation unit configured to generate an insertion sentence explaining the recognized emotion type in the second language; and

a synthesis unit configured to convert the filtered sentence and the insertion sentence into a speech signal.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

According to one embodiment, a speech translation apparatus includes a receiving unit, a first recognition unit, a second recognition unit, a first generation unit, a translation unit, a second generation unit, a synthesis unit. The receiving unit is configured to receive a speech in a first language and convert to speech signal. The first recognition unit is configured to perform speech recognition and generate a transcription. The second recognition unit is configured to recognize which emotion type is included in the speech and generate emotion identification information including recognized emotion type(s). The first generation unit is configured to generate a filtered sentence. The translation unit is configured to generate a translation of the filtered sentence in the first language in a second language. The second generation unit is configured to generate an insertion sentence. The synthesis unit is configured to convert the filtered and the insertion sentences into speech signal.

Citations

12 Claims

1. A speech translation apparatus comprising:
- a receiving unit configured to receive a speech in a first language and convert the speech to a speech signal;
  
  a first recognition unit configured to perform a speech recognition of the speech signal and generate a transcription of the speech signal, the transcription including one or more raw words or one or more raw phrases in the first language, the raw words and the raw phrases being affected by an speaker'"'"'s emotion;
  
  a second recognition unit configured to recognize which emotion type is included in the speech using at least one of the transcription and the speech signal and to generate emotion identification information item including at least one recognized emotion type;
  
  a first generation unit configured to generate a filtered sentence by transforming the one or more raw words or the one or more raw phrases into one or more filtered words or one or more filtered phrases in the first language, referring to a first model that the raw words and the raw phrases correspond to the filtered words and the filtered phrases, the filtered words and the filtered phrases failing to be affected by the speaker'"'"'s emotion;
  
  a translation unit configured to generate a translation of the filtered sentence in the first language into a second language which is different from the first language;
  
  a second generation unit configured to generate an insertion sentence explaining the recognized emotion type in the second language; and
  
  a synthesis unit configured to convert the filtered sentence and the insertion sentence into a speech signal.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The apparatus according to claim 1, further comprising a superimposition unit configured to generate an emotion-superimposed translation in which one or more filtered words or one or more filtered phrases in the second language included in the translation is transformed to one or more raw words or one or more filtered phrases in the second language in accordance with the emotion identification information item, referring to fillers for the emotion type and a second model in which the raw words and the raw phrases in the second language and corresponding the filtered words and the filtered phrases are stored by the emotion type,wherein the synthesis unit converts the emotion-superimposed translation into a speech signal.
  - 3. The apparatus according to claim 2, further comprising a control unit configured to control to select generating the emotion-superimposed translation, generating the insertion sentence and adding the insertion sentence to the translation, or generating the emotion-superimposed translation and the insertion sentence and adding the insertion sentence to the emotion-superimposed translation in accordance with the emotion identification information item.
  - 4. The apparatus according to claim 3, wherein the control unit controls to generate the emotion-superimposed translation when the emotion identification information item indicate a positive emotion type, and to generate the insertion sentence and add the insertion sentence to the translation when the emotion identification information item indicates a negative emotion type.
  - 5. The apparatus according to claim 3, wherein the control unit adds the insertion sentence to the emotion-superimposed translation when both the positive emotion type and the negative emotion type are included in the emotion identification information item.
  - 6. The apparatus according to claim 1, wherein the emotion identification information item is correspondences between the emotion type and probability.
  - 7. The apparatus according to claim 1, wherein the second recognition unit further comprises:
    - a speech emotion recognition unit configured to identify the emotion type of the transcription and generate one or more speech emotion candidates based on at least one identified emotion type, using a signal strength of the speech signal and a difference between the basic frequencies of the speech signal, andan emotion identification unit configured to generate the speech emotion candidates as the emotion identification information item.
  - 8. The apparatus according to claim 1, wherein the second recognition unit further comprises:
    - a expressed emotion identification unit configured to identify the emotion type of the transcription and generate one or more expressed emotion candidates based on at least one identified emotion type, using words and phrases in the transcription and sentence-end forms in the transcription; and
      
      an emotion identification unit configured to generate the expressed emotion candidates as the emotion identification information item.
  - 9. The apparatus according to claim 1, further comprising a sentence transformation table configured to store the first model.
  - 10. The apparatus according to claim 1, further comprising a insertion sentence table configured to store at least one emotion type and at least one explanation of the emotion corresponding to the emotion type in the second language,wherein the second generation unit generates the insertion sentence based on the insertion sentence table.

11. A speech translation method comprising:
- receiving a speech in a first language and converting the speech to a speech signal;
  
  performing a speech recognition of the speech signal and generating a transcription of the speech signal, the transcription including one or more raw words or one or more raw phrases in the first language, the raw words and the raw phrases being affected by an speaker'"'"'s emotion;
  
  recognizing which emotion type is included in the speech using at least one of the transcription and speech signal and generating emotion identification information item including at least one recognized emotion type;
  
  generating a filtered sentence by transforming the one or more raw words or the one or more raw phrases into one or more filtered words or one or more filtered phrases in the first language, referring to a first model that the raw words and the raw phrases correspond to the filtered words and the filtered phrases, the filtered words and the filtered phrases failing to be affected by the speaker'"'"'s emotion;
  
  generating a translation of the filtered sentence in the first language into a second language which is different from the first language;
  
  generating an insertion sentence explaining the recognized emotion type in the second language; and
  
  converting the filtered sentence and the insertion sentence into a speech signal.

12. A non-transitory computer readable medium including computer executable instructions, wherein the instructions, when executed by a processor, cause the processor to perform a method comprising:
- receiving a speech in a first language and converting the speech to a speech signal;
  
  performing a speech recognition of the speech signal and generating a transcription of the speech signal, the transcription including one or more raw words or one or more raw phrases in the first language, the raw words and the raw phrases being affected by an speaker'"'"'s emotion;
  
  recognizing which emotion type is included in the speech using at least one of the transcription and speech signal and generating emotion identification information item including at least one recognized emotion type;
  
  generating a filtered sentence by transforming the one or more raw words or the one or more raw phrases into one or more filtered words or one or more filtered phrases in the first language, referring to a first model that the raw words and the raw phrases correspond to the filtered words and the filtered phrases, the filtered words and the filtered phrases failing to be affected by the speaker'"'"'s emotion;
  
  generating a translation of the filtered sentence in the first language into a second language which is different from the first language;
  
  generating an insertion sentence explaining the recognized emotion type in the second language; and
  
  converting the filtered sentence and the insertion sentence into a speech signal.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Kabushiki Kaisha Toshiba (Toshiba Corporation)
Original Assignee
Kabushiki Kaisha Toshiba (Toshiba Corporation)
Inventors
Sumita, Kazuo

Granted Patent

US 8,635,070 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/2
CPC Class Codes

G06F 40/247   Thesauruses; Synonyms

G06F 40/51   Translation evaluation

G06F 40/58   Use of machine translation,...

G10L 17/26   Recognition of special voic...

SPEECH TRANSLATION APPARATUS, METHOD AND PROGRAM

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

12 Claims

Specification

Solutions

Use Cases

Quick Links

SPEECH TRANSLATION APPARATUS, METHOD AND PROGRAM

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

12 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links