VISUALIZING AUTOMATIC SPEECH RECOGNITION AND MACHINE

US 20120010869A1
Filed: 07/12/2010
Published: 01/12/2012
Est. Priority Date: 07/12/2010
Status: Active Grant

First Claim

Patent Images

1. An automated speech processing method comprising:

using a speech-to-text (STT) engine for receiving an audio input and for converting the audio input to text data in a source language;

using a machine translation (MT) engine for receiving the text data from the STT engine and for translating the text data to text data in a target language;

using a caption engine for rendering the text data in the target language on a display device; and

applying different visualization schemes to different parts of the rendered text data based on defined characteristics of the STT engine and the MT engine.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An automated speech processing method, system and computer program product are disclosed. In one embodiment, a speech-to-text (STT) engine is used for converting an audio input to text data in a source language, and a machine translation (MT) engine is used for translating this text data to text data in a target language. In this embodiment, the text data in the target language is rendered on a display device, and different visualization schemes are applied to different parts of the rendered text data based on defined characteristics of the STT engine and the MT engine. In one embodiment, the defined characteristics include a defined confidence value representing the accuracy of the rendered text. For example, this confidence value may be based on both the accuracy of the conversion of the audio input and the accuracy of the translation of the text data to the target language.

30 Citations

View as Search Results

20 Claims

1. An automated speech processing method comprising:
- using a speech-to-text (STT) engine for receiving an audio input and for converting the audio input to text data in a source language;
  
  using a machine translation (MT) engine for receiving the text data from the STT engine and for translating the text data to text data in a target language;
  
  using a caption engine for rendering the text data in the target language on a display device; and
  
  applying different visualization schemes to different parts of the rendered text data based on defined characteristics of the STT engine and the MT engine.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method according to claim 1, wherein said defined characteristics include a defined confidence value representing the accuracy of the rendered text data.
  - 3. The method according to claim 2, wherein said defined confidence value is based on both the accuracy of the converting the audio input to text data in the source language and the accuracy of the translating the text data in the source language to text data in the target language.
  - 4. The method according to claim 1, wherein the text data in the target language includes translated words and the caption engine renders said translated words, and wherein:
    - the MT engine assigns a confidence score to each of at least some of the translated words; and
      
      the applying different visualization schemes includes applying a selected one of the visualization schemes to the rendered translated words that have a confidence score above a given threshold value.
  - 5. The method according to claim 1, wherein:
    - the text data in the target language includes translated words and the caption engine renders said translated words;
      
      the MT engine assigns a confidence score to each of at least some of the translated words; and
      
      the applying different visualization schemes includes applying a selected one of the visualization schemes to the rendered translated words that have a confidence score below a given threshold value.
  - 6. The method according to claim 1, wherein:
    - the text data in the source language includes words in the source language and the rendered text data includes rendered words;
      
      for each of at least some of the words in the source language, the STT engine assigns a confidence value to said each word, and said each word corresponds to one or more of the rendered words; and
      
      the applying different visualization schemes includes applying a selected one of the visualization schemes to each of said rendered words that corresponds to one of the words in the source language that has a confidence value above a given threshold value.
  - 7. The method according to claim 1, wherein:
    - the text data in the source language includes words in the source language and the rendered text data includes rendered words;
      
      for each of at least some of the words in the source language, the STT engine assigns a confidence value to said each word, and said each word corresponds to one or more of the rendered words; and
      
      the applying different visualization schemes includes applying a selected one of the visualization schemes to each of said rendered words that corresponds to one of the words in the source language that has a confidence value below a given threshold value.
  - 8. The method according to claim 1, wherein:
    - identifiable portions of the rendered text correspond to identifiable portions of the audio input;
      
      the STT engine measures a rate of speech in the audio input; and
      
      the applying different visualization schemes includes applying a selected one the visualization schemes to the identifiable portions of the rendered text that correspond to portions of the audio input having a rate of speech above a given value.
  - 9. The method according to claim 1, wherein:
    - spoken words are spoken in the audio input, and the rendered text includes rendered words that are rendered on the display device;
      
      each of at least some of the rendered words corresponds to one of the spoken words; and
      
      the applying different visualization schemes includes applying a selected one of the visualization schemes to the rendered words that are rendered on the display device more than a given time period after or before the corresponding one of the spoken words is spoken in the audio input.
  - 10. The method according to claim 1, wherein:
    - the text data in the source language includes converted words that are converted from the audio input, and the text data in the target language includes translated words that are translated from the text data in the source language;
      
      the translating includes using a defined word aligning procedure to align some of the converted words with corresponding ones of the translated words, and wherein some of the converted words are not aligned, using the defined word aligning procedure, with any of the translated words;
      
      the rendering includes rendering the translated words; and
      
      the applying different visualization schemes includes applying a selected one of the visualization schemes to the rendering of the translated words that are not aligned with any of the converted words.

11. An automated speech processing system comprising:
- a speech-to-text (STT) engine for receiving an audio input and for converting the audio input to text data in a source language;
  
  a machine translation (MT) engine for receiving the text data from the STT engine and for translating the text data to text data in a target language;
  
  a caption engine for rendering the text data in the target language on a display device, and for applying different visualization schemes to different parts of the rendered text data based on defined characteristics of the STT engine and the MT engine.
- View Dependent Claims (12, 13, 14, 15)
- - 12. The system according to claim 11, wherein said defined characteristics include a defined confidence value representing the accuracy of the rendered text data, and said defined confidence value is based on both the accuracy of the converting the audio input to text data in the source language and the accuracy of the translating the text data in the source language to text data in the target language.
  - 13. The system according to claim 11, wherein the text data in the target language includes translated words and the caption engine renders said translated words, and wherein:
    - the MT engine assigns a confidence score to each of at least some of the translated words; and
      
      the applying different visualization schemes includes applying a selected one of the visualization schemes to the rendered translated words that have a defined confidence score.
  - 14. The system according to claim 13, wherein:
    - identifiable portions of the rendered text correspond to identifiable portions of the audio input;
      
      the STT engine measures a rate of speech in the audio input; and
      
      the applying different visualization schemes includes applying a selected one the visualization schemes to the identifiable portions of the rendered text that correspond to portions of the audio input having a rate of speech above a given value.
  - 15. The system according to claim 11, wherein:
    - spoken words are spoken in the audio input, and the rendered text includes rendered words that are rendered on the display device;
      
      each of at least some of the rendered words corresponds to one of the spoken words; and
      
      the applying different visualization schemes includes applying a selected one of the visualization schemes to the rendered words that are rendered on the display device more than a given time period after or before the corresponding one of the spoken words is spoken in the audio input.

16. An article of manufacture comprising:
- at least one tangible computer readable medium having computer readable program code logic to execute machine instructions in one or more processing units for processing speech, said computer readable program code logic, when executing, performing the following;
  
  receiving an audio input at a speech-to-text (STT) engine and converting the audio input to text data in a source language;
  
  translating the text data, using a machine translation (MT) engine, to text data in a target language;
  
  rendering the text data in the target language on a display device; and
  
  applying different visualization schemes to different parts of the rendered text data based on defined characteristics of the STT engine and the MT engine.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The article of manufacture according to claim 16, wherein the text data in the target language includes translated words and the caption engine renders said translated words, and wherein:
    - the MT engine assigns a confidence score to each of at least some of the translated words; and
      
      the applying different visualization schemes includes applying a selected one of the visualization schemes to the rendered translated words that have a defined confidence score.
  - 18. The article of manufacture according to claim 16, wherein:
    - the text data in the source language includes words in the source language and the rendered text data includes rendered words;
      
      for each of at least some of the words in the source language, the STT engine assigns a confidence value to said each word, and said each word corresponds to one or more of the rendered words; and
      
      the applying different visualization schemes includes applying a selected one of the visualization schemes to each of said rendered words that corresponds to one of the words in the source language that has a confidence value above a given threshold value.
  - 19. The article of manufacture according to claim 16, wherein:
    - identifiable portions of the rendered text correspond to identifiable portions of the audio input;
      
      the STT engine measures a rate of speech in the audio input; and
      
      the applying different visualization schemes includes applying a selected one the visualization schemes to the identifiable portions of the rendered text that correspond to portions of the audio input having a rate of speech above a given value.
  - 20. The article of manufacture according to claim 16, wherein:
    - the text data in the source language includes converted words that are converted from the audio input, and the text data in the target language includes translated words that are translated from the text data in the source language;
      
      the translating includes using a defined word aligning procedure to align some of the converted words with corresponding ones of the translated words, and wherein some of the converted words are not aligned, using the defined word aligning procedure, with any of the translated words;
      
      the rendering includes rendering the translated words; and
      
      the applying different visualization schemes includes applying a selected one of the visualization schemes to the rendering of the translated words that are not aligned with any of the converted words.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
International Business Machines Corporation
Inventors
McCarley, Jeffrey S., Qian, Leiming R.

Granted Patent

US 8,554,558 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/3
CPC Class Codes

G06F 40/44   Statistical methods, e.g. p...

G06F 40/45   Example-based machine trans...

G10L 15/26   Speech to text systems G10L...

G10L 21/06   Transformation of speech in...

VISUALIZING AUTOMATIC SPEECH RECOGNITION AND MACHINE

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

30 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

VISUALIZING AUTOMATIC SPEECH RECOGNITION AND MACHINE

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

30 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links