Translation and capture architecture for output of conversational utterances

US 7,991,607 B2
Filed: 06/27/2005
Issued: 08/02/2011
Est. Priority Date: 06/27/2005
Status: Active Grant

First Claim

Patent Images

1. A system that facilitates data translation, comprising:

a processor; and

memory coupled to the processor;

an input component stored in the memory and executable on the processor that processes input data received from a plurality of sources that represent context and content, wherein the input component processes the input data to determine which linguistic language is represented by the context and the content, andextracts text from the input data to generate query terms and employs the query terms with a search engine to determine a first linguistic language;

wherein the context is established based on at least one of video, image data or scanned document indicia, and the determined linguistic language comprises the first linguistic language;

a translation component stored in the memory and executable on the processor that translates the processed input data into a translated output in a second linguistic language which includes at least one of text or audible signals for perception by a recipient, the first linguistic language being different from the second linguistic language; and

a feedback component stored in the memory and executable on the processor that receives, in response to the translated output, feedback in the second linguistic language from the recipient relating to accuracy of the translation, wherein the feedback is employed as additional input data to provide a new translated output in the second linguistic language, and wherein the feedback is employed as additional input data for establishing the context.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Architecture that combines capture and translation of concepts, goals, needs, locations, objects, locations, and items (e.g., sign text) into complete conversational utterances that take a translation of the item, and morph it with fluidity into sets of sentences that can be echoed to a user, and that the user can select to communicate speech (or textual utterances). A plurality of modalities that process images, audio, video, searches and cultural context, for example, which are representative of at least context and/or content, and can be employed to glean additional information regarding a communications exchange to facilitate more accurate and efficient translation. Gesture recognition can be utilized to enhance input recognition, urgency, and/or emotional interaction, for example. Speech can be used for document annotation. Moreover, translation (e.g., speech to speech, text to speech, speech to text, handwriting to speech, text or audio, . . . ) can be significantly improved in combination with this architecture.

Citations

17 Claims

1. A system that facilitates data translation, comprising:
- a processor; and
  
  memory coupled to the processor;
  
  an input component stored in the memory and executable on the processor that processes input data received from a plurality of sources that represent context and content, wherein the input component processes the input data to determine which linguistic language is represented by the context and the content, andextracts text from the input data to generate query terms and employs the query terms with a search engine to determine a first linguistic language;
  
  wherein the context is established based on at least one of video, image data or scanned document indicia, and the determined linguistic language comprises the first linguistic language;
  
  a translation component stored in the memory and executable on the processor that translates the processed input data into a translated output in a second linguistic language which includes at least one of text or audible signals for perception by a recipient, the first linguistic language being different from the second linguistic language; and
  
  a feedback component stored in the memory and executable on the processor that receives, in response to the translated output, feedback in the second linguistic language from the recipient relating to accuracy of the translation, wherein the feedback is employed as additional input data to provide a new translated output in the second linguistic language, and wherein the feedback is employed as additional input data for establishing the context.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The system of claim 1, wherein audible signals in the second linguistic language are understandable utterances to the recipient.
  - 3. The system of claim 1, wherein text in the second linguistic language is understandable utterances to the recipient.
  - 4. The system of claim 1, wherein the input component is further configured to process image data included in the input data via optical character recognition to extract text from the image data and employ the text to determine the first linguistic language.
  - 5. The system of claim 1, wherein the input data includes speech signals from a user in the first linguistic language which are translated into the second linguistic language for understandable perception of the recipient.
  - 6. The system of claim 5, wherein the speech signals translated into the second linguistic language are presented understandably in at least one of the text in the second linguistic language to the recipient via a display or the audible signals in the second linguistic language to the recipient via an audio system.
  - 7. The system of claim 1, wherein the input component is further configured to process the audio data included in the input data via speech recognition into text to determine the first linguistic language.
  - 8. The system of claim 1, further comprising an assembler component that assembles translated words into sentences in the second linguistic language which are understandable by the recipient.
  - 9. The system of claim 1, wherein the sources of the input component are selectable by the recipient in order to form the translated output.

10. A computer-implemented method of translating data, comprising:
- receiving one or more input data from one or more sensing sources, wherein the one or more sensing sources comprise at least one of audio, video, global positioning, or image sensing sources;
  
  generating context data of at least one of the one or more input data, the generating context data including;
  
  extracting text from at least one of the one or more input data to generate query terms and employing the query terms with a search engine to determine a first linguistic language;
  
  translating one or more results from the search engine into a translated output in a second linguistic language;
  
  presenting the translated output to a recipient in the second linguistic language that is understandable by the recipient;
  
  receiving a user feedback in the second linguistic language from the recipient, wherein the user feedback includes an indication that the translation is successful or unsuccessful;
  
  establishing the context of the at least one of the one or more input data based on the user feedback; and
  
  employing the established context as an additional input for translating the content.
- View Dependent Claims (11, 12, 13, 14, 15)
- - 11. The method of claim 10, further comprising an act of selecting the translated output to be formatted as audio signals in a form of speech in the second linguistic language.
  - 12. The method of claim 10, further comprising an act of assembling words representative of at least one of the context data and content data into sentences in the second linguistic language.
  - 13. The method of claim 10, further comprising an act of employing global positioning system (GPS) data included in the one or more input data to determine the first linguistic language.
  - 14. The method of claim 10, further comprising an act of searching at least one of a local data store or a network-based resource for contextual information to determine the first linguistic language, wherein the searching employs the text included in the at least one of the one or more input data.
  - 15. The method of claim 10, further comprising an act of extracting text from one or more images included in the at least one of the one or more input data to determine the translated output.

16. A method comprising:
- under control of one or more processors configured with executable instructions;
  
  receiving one or more inputs from one or more sensing sources, the one or more inputs comprising image data and/or video data, the image data and/or video comprise a gesture;
  
  determining context of at least one of the one or more inputs;
  
  the determining comprising using the gesture to identify or narrow information to determine the context;
  
  extracting text from the one or more inputs to generate query terms and employing the query terms with a search engine to determine a first linguistic language;
  
  translating an input of the one or more inputs into a translated output in a second linguistic language based at least upon the context of the at least one of the one or more inputs;
  
  receiving a user feedback of the translated output; and
  
  producing a new translated output based at least upon the user feedback.
- View Dependent Claims (17)
- - 17. The method of claim 16, wherein the gesture is further used to enhance input recognition, urgency and emotional interaction of a user.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Williams, David W., Kong, Yuan, Zhang, Zhengyou, Liu, Zicheng
Primary Examiner(s)
Saint Cyr; Leonard

Application Number

US11/167,870
Publication Number

US 20060293874A1
Time in Patent Office

2,227 Days
Field of Search

None
US Class Current

704/2
CPC Class Codes

G06F 18/256   of results relating to diff...

G06F 40/58   Use of machine translation,...

G06V 10/811   the classifiers operating o...

G06V 40/20   Movements or behaviour, e.g...

G10L 15/1822   Parsing for meaning underst...

Translation and capture architecture for output of conversational utterances

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Translation and capture architecture for output of conversational utterances

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links