ENHANCED SPEECH-TO-SPEECH TRANSLATION SYSTEM AND METHODS
First Claim
1. A computer assisted method for overriding the recognition or translation of an utterance input in a speech translation system for translating a first language into a second language, the method comprising:
- (a) accepting by the speech translation system an utterance in the first language, wherein the translation system adds the utterance to a first automatic speech recognition module of the first language, translates the utterance to a corresponding translation in the second language using a first machine translation module, generates a speech output for the translated utterance via a text to speech module, associates a description with the utterance, wherein the description contains text of the utterance, a pronunciation, a translation and a pronunciation of the translation, prompts a user to verify the description, and updates the utterance and the user verified description in a first machine translation module associated with the first language;
(b) aborting translation by the speech translation system to terminate processing in the first automatic speech recognition module, first machine translation module and text to speech module to remove any hypotheses or partial hypotheses that may have been created and terminate production of translation from the text-to-speech module; and
(c) resetting the translation system to accept a new utterance.
3 Assignments
0 Petitions
Accused Products
Abstract
A speech translation system and methods for cross-lingual communication that enable users to improve and modify content and usage of the system and easily abort or reset translation. The system includes a speech recognition module configured for accepting an utterance, a machine translation module, an interface configured to communicate the utterance and proposed translation, a correction module and an abort action unit that removes any hypotheses or partial hypotheses and terminates translation. The system also includes modules for storing favorites, changing language mode, automatically identifying language, providing language drills, viewing third party information relevant to conversation, among other things.
432 Citations
23 Claims
-
1. A computer assisted method for overriding the recognition or translation of an utterance input in a speech translation system for translating a first language into a second language, the method comprising:
-
(a) accepting by the speech translation system an utterance in the first language, wherein the translation system adds the utterance to a first automatic speech recognition module of the first language, translates the utterance to a corresponding translation in the second language using a first machine translation module, generates a speech output for the translated utterance via a text to speech module, associates a description with the utterance, wherein the description contains text of the utterance, a pronunciation, a translation and a pronunciation of the translation, prompts a user to verify the description, and updates the utterance and the user verified description in a first machine translation module associated with the first language; (b) aborting translation by the speech translation system to terminate processing in the first automatic speech recognition module, first machine translation module and text to speech module to remove any hypotheses or partial hypotheses that may have been created and terminate production of translation from the text-to-speech module; and (c) resetting the translation system to accept a new utterance.
-
-
2. The method of claim 1, wherein the aborting includes shaking, by the user, of the speech translation system.
-
3. The method of claim 1, wherein the aborting includes pressing a record button or touching a screen of the speech translation system.
-
4. The method of claim 1, wherein the aborting includes inputting an utterance that corresponds to an aborting command phrase.
-
5. The method of claim 1, further comprising the step of indicating, by the speech translation system, the aborting action with an acoustical confirmation, wherein the acoustical confirmation includes a crumbling noise or other sound.
-
6. The method of claim 1, further comprising the step of saving the text of the utterance and the translation as sentence pairs upon instruction by the user to save the sentence pairs as a favorite in a speech translation favorites module configured to store a list or hierarchical inventory of such sentence pairs wherein each favorite can be customized and played directly upon user selection in either the first or second language.
-
7. The method of claim 1, further comprising the step of differentiating the type of language use and selecting a different language mode, wherein the selecting is made by a user or automatically inferred based upon multiple input utterances, and replacing standard components in the first speech recognition module and the first machine translation module with components that are conditioned by a different language mode adapted to that language use.
-
8. The method of claim 7, wherein the language mode is based upon one or more of the following types of language uses:
- social situation, dialect, relative relations between speakers, social relationship between speakers, gender and age of speaker or listener, physical location of speaker and listener, activity, accent, emotion, stress, personality, formality, assertiveness or other environmental, discourse and user contexts.
-
9. The method of claim 7, further comprising the step of changing the voice in the text to speech module according to the language use.
-
10. A field maintainable class-based speech translation system for translating a first language into a second language comprising:
-
a speech recognition module of a first language configured for accepting sound comprising an utterance in a first language, determining if it is a new utterance and associating a description with the new utterance; a first machine translation module associated with the first language comprising a first tagging module, a first translation model and a first language module, wherein the description contains text of the utterance, a pronunciation, a translation and a pronunciation of the translation wherein the pronunciation and translation are generated via rule-based or statistical models; an interface configured to output to a user the description of the new utterance; a correction module configured to accept the user'"'"'s verification or correction of the pronunciation and translation of the new utterance via user editable phonetic transcription, wherein the first machine translation module is configured to be updated with the new utterance and the description; and an abort action unit configured to abort processing of the utterance in the first speech recognition module, the first machine translation or both upon request from a user.
-
-
11. The translation system of claim 10 wherein the abort action unit comprises an accelerometer or a camera that measures movement.
-
12. The translation system of claim 10, further comprising a text processor configured to identify words in the utterance as being potentially inappropriate and replacing the inappropriate words with a beep or bleep sound.
-
13. The translation system of claim 10, further comprising a conditioning mode configured to use the first translation module and first language module in combination with prosodic parameters and utterances of the synthesis to render the pronunciation of the translation more appropriate based on language use.
-
14. The translation system of claim 10, wherein the correction module is configured to identify a user corrected new word, wherein the corrected new word is not contained in an internal dictionary of the translation system, determine if the user corrected new word is a named entity by running a name identity tagging model, and registering the corrected new word and its description to the first speech recognition, first machine translation and text to speech modules
-
15. The translation system of claim 10, wherein the correction module is configured to merge any user corrected new word, its description and its pre-trained language model entry from background models into the recognition lexicon and translation module, wherein the user corrected new utterance is contained in a background dictionary.
-
16. The translation system of claim 10, wherein the recognition lexicon comprises of a speech language identification module configured to identify the language being spoken.
-
17. The translation system of claim 10, further comprising a language learning module which produces customizable learning based upon discourse content, wherein the information extraction learning module is configured to
log the user'"'"'s utterances, create a profile of the user'"'"'s utterances based upon syntactic constructs, frequencies of use and semantic word clustering of the user'"'"'s utterances, and construct a language learning drill based upon the profile.
-
18. The translation system of claim 10, further comprising a portable device, wherein the recognition lexicon is configured to accept sound from the portable device, wherein the portable device is a phone or personal device assistant, and the first speech recognition module and the first machine translation modules are configured to provide consecutive or simultaneous translations of the incoming sound.
-
19. The translation system of claim 18, wherein the first machine translation module provides simultaneous translation by automatically segmenting sentences from input utterances using speech segmentation and speech translation, wherein the sentences comprise one or more new words.
-
20. The translation system of claim 10, further comprising a prosodic module configured to use prosodic cues and produce back-channeling cues, wherein the prosodic cues include pauses, pitch contours, and intensity of the sound accepted in the recognition lexicon.
-
21. The translation system of claim 10, wherein the speech recognition and machine translation modules are configured to deliver simultaneous translation over the internet.
-
22. The translation system of claim 21, wherein the simultaneous translation is presented as sound or text.
-
23. The translation system of claim 10, further comprising an information extraction module, wherein the information extraction module considers the text of the utterance, compares it to information obtained from the internet or a local knowledgebase and presents the user with information in either the first or second language, wherein the information includes one or more of the following types of information:
- targeted advertising, flight schedules, hotel availability and rates, and contact details of persons.
Specification