INTERACTIVE SPEECH RECOGNITION

US 20130132079A1
Filed: 11/17/2011
Published: 05/23/2013
Est. Priority Date: 11/17/2011
Status: Abandoned Application

First Claim

Patent Images

1. A computer program product tangibly embodied on a computer-readable storage medium and including executable code that causes at least one data processing apparatus to:

obtain audio data associated with a first utterance;

obtain, via a device processor, a text result associated with a first speech-to-text translation of the first utterance based on an audio signal analysis associated with the audio data, the text result including a plurality of selectable text alternatives corresponding to at least one word;

initiate a display of at least a portion of the text result that includes a first one of the text alternatives; and

receive a selection indication indicating a second one of the text alternatives.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A first plurality of audio features associated with a first utterance may be obtained. A first text result associated with a first speech-to-text translation of the first utterance may be obtained based on an audio signal analysis associated with the audio features, the first text result including at least one first word. A first set of audio features correlated with at least a first portion of the first speech-to-text translation associated with the at least one first word may be obtained. A display of at least a portion of the first text result that includes the at least one first word may be initiated. A selection indication may be received, indicating an error in the first speech-to-text translation, the error associated with the at least one first word.

21 Citations

View as Search Results

20 Claims

1. A computer program product tangibly embodied on a computer-readable storage medium and including executable code that causes at least one data processing apparatus to:
- obtain audio data associated with a first utterance;
  
  obtain, via a device processor, a text result associated with a first speech-to-text translation of the first utterance based on an audio signal analysis associated with the audio data, the text result including a plurality of selectable text alternatives corresponding to at least one word;
  
  initiate a display of at least a portion of the text result that includes a first one of the text alternatives; and
  
  receive a selection indication indicating a second one of the text alternatives.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The computer program product of claim 1, wherein:
    - obtaining the text result includes obtaining, via the device processor, search results based on a search query based on the first one of the text alternatives.
  - 3. The computer program product of claim 1, wherein:
    - the audio data includes one or more of;
      
      audio features determined based on a quantitative analysis of audio signals obtained based on the first utterance, orthe audio signals obtained based on the first utterance.
  - 4. The computer program product of claim 1, wherein the executable code is configured to cause the at least one data processing apparatus to:
    - obtain search results based on a search query based on the second one of the text alternatives; and
      
      initiate a display of at least a portion of the search results.
  - 5. The computer program product of claim 1, wherein:
    - obtaining the text result associated with the first speech-to-text translation of the first utterance includes obtaining a first segment of the audio data correlated to a translated portion of the first speech-to-text translation of the first utterance to the second one of the text alternatives, anda plurality of translation scores, wherein each of the plurality of selectable text alternatives is associated with a corresponding one of the translation scores indicating a probability of correctness in text-to-speech translation,wherein the first one of the text alternatives is associated with a first translation score indicating a highest probability of correctness in text-to-speech translation among the plurality of selectable text alternatives.
  - 6. The computer program product of claim 5, wherein the executable code is configured to cause the at least one data processing apparatus to:
    - initiate transmission of the selection indication indicating the second one of the text alternatives and the first portion of the audio data.
  - 7. The computer program product of claim 1, wherein:
    - initiating the display of at least the portion of the text result that includes the first one of the text alternatives includes initiating the display of one or more of;
      
      a list delimited by text delimiters,a drop-down list, ora display of the first one of the text alternatives that includes a selectable link associated with a display of at least the second one of the text alternatives in a pop-up display frame.

8. A method comprising:
- obtaining a first plurality of audio features associated with a first utterance;
  
  obtaining, via a device processor, a first text result associated with a first speech-to-text translation of the first utterance based on an audio signal analysis associated with the audio features, the first text result including at least one first word;
  
  obtaining a first set of audio features correlated with at least a first portion of the first speech-to-text translation associated with the at least one first word;
  
  initiating a display of at least a portion of the first text result that includes the at least one first word; and
  
  receiving a selection indication indicating an error in the first speech-to-text translation, the error associated with the at least one first word.
- View Dependent Claims (9, 10, 11, 12, 13, 14, 15)
- - 9. The method of claim 8, wherein:
    - the first speech-to-text translation of the first utterance includes a speaker independent speech recognition translation of the first utterance.
  - 10. The method of claim 8, further comprising:
    - obtaining a second text result based on an analysis of the first speech-to-text translation of the first utterance and the selection indication indicating the error.
  - 11. The method of claim 8, further comprising:
    - initiating transmission of the selection indication indicating the error in the first speech-to-text translation, and the set of audio features correlated with at least a first portion of the first speech-to-text translation associated with the at least one first word.
  - 12. The method of claim 8, wherein:
    - receiving the selection indication indicating the error in the first speech-to-text translation, the error associated with the at least one first word includes one or more of;
      
      receiving an indication of a user touch on a display of the at least one first word,receiving an indication of a user selection based on a display of a list of alternatives that include the at least one first word,receiving an indication of a user selection based on a display of a drop-down menu of one or more alternatives associated with the at least one first word, orreceiving an indication of a user selection based on a display of a popup window of a display of the one or more alternatives associated with the at least one first word.
  - 13. The method of claim 8, wherein:
    - the first text result includes a second word different from the at least one word, wherein the method further comprises;
      
      obtaining a second set of audio features correlated with at least a second portion of the first speech-to-text translation associated with the second word, wherein the second set of audio features are based on a substantially nonoverlapping timing interval in the first utterance, compared with the at least one word.
  - 14. The method of claim 8, further comprising:
    - obtaining a second plurality of audio features associated with a second utterance, the second utterance associated with verbal input associated with a correction of the error associated with the at least one first word; and
      
      obtaining, via the device processor, a second text result associated with a second speech-to-text translation of the second utterance based on an audio signal analysis associated with the second plurality of audio features, the second text result including at least one corrected word different from the first word.
  - 15. The method of claim 14, further comprising:
    - initiating transmission of the selection indication indicating the error in the first speech-to-text translation, and the second plurality of audio features associated with the second utterance.

16. A system comprising:
- an input acquisition component that obtains a first plurality of audio features associated with a first utterance;
  
  a speech-to-text component that obtains, via a device processor, a first text result associated with a first speech-to-text translation of the first utterance based on an audio signal analysis associated with the audio features, the first text result including at least one first word;
  
  a clip correlation component that obtains a first correlated portion of the first plurality of audio features associated with the first speech-to-text translation to the at least one first word;
  
  a result delivery component that initiates an output of the first text result and the first correlated portion of the first plurality of audio features; and
  
  a correction request acquisition component that obtains a correction request that includes an indication that the at least one first word is a first speech-to-text translation error, and the first correlated portion of the first plurality of audio features. Docket No. 333249.01
- View Dependent Claims (17, 18, 19, 20)
- - 17. The system of claim 16, further comprising:
    - a search request component that initiates a first search operation based on the first text result associated with the first speech-to-text translation of the first utterance, wherein;
      
      the result delivery component initiates the output of the first text result and the first correlated portion of the first plurality of audio features with results of the first search operation.
  - 18. The system of claim 16, wherein:
    - the speech-to-text component obtains, via the device processor, the first text result associated with the first speech-to-text translation of the first utterance based on the audio signal analysis associated with the first plurality of audio features, the first text result including a plurality of text alternatives, the at least one first word included in the plurality of first text alternatives, whereinthe first correlated portion of the first plurality of audio features associated with the first speech-to-text translation to the at least one first word is associated with the plurality of first text alternatives.
  - 19. The system of claim 18, wherein:
    - each of the plurality of first text alternatives is associated with a corresponding translation score indicating a probability of correctness in text-to-speech translation,wherein the at least one first word is associated with a first translation score indicating a highest probability of correctness in text-to-speech translation among the plurality of first text alternatives,wherein the output of the first text result includes an output of the plurality of first text alternatives and the corresponding translation scores.
  - 20. The system of claim 19, wherein:
    - the result delivery component initiates the output of the first text result, the first correlated portion of the first plurality of audio features, and at least a portion of the corresponding translation scores; and
      
      the correction request acquisition component obtains the correction request that includes the indication that the at least one first word is a first speech-to-text translation error, and one or more of;
      
      the first correlated portion of the first plurality of audio features, and the at least a portion of the corresponding translation scores, or a second plurality of audio features associated with a second utterance corresponding to verbal input associated with a correction of the first speech-to-text translation error based on the at least one first word.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Raza, Mirza Muhammad, Sehgal, Muhammad Shoaib B.

Application Number

US13/298,291
Publication Number

US 20130132079A1
Time in Patent Office

Days
Field of Search
US Class Current

704/235
CPC Class Codes

G10L 15/22 Procedures used during a sp...

G10L 2015/221 Announcement of recognition...

INTERACTIVE SPEECH RECOGNITION

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

21 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

INTERACTIVE SPEECH RECOGNITION

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

21 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links