METHODS AND APPARATUS FOR ACOUSTIC DISAMBIGUATION

US 20120303371A1
Filed: 05/23/2012
Published: 11/29/2012
Est. Priority Date: 05/23/2011
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

identifying at least one text segment, in a textual representation having a plurality of text segments, having at least one acoustically similar word and/or phrase;

annotating the textual representation with disambiguating information to help disambiguate the at least one text segment from the at least one acoustically similar word and/or phrase; and

synthesizing a speech signal, at least in part, by performing text-to-speech synthesis on at least a portion of the textual representation that includes the at least one text segment, wherein the speech signal includes speech corresponding to the disambiguating information located proximate the portion of the speech signal corresponding to the at least one text segment.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques for disambiguating at least one text segment from at least one acoustically similar word and/or phrase. The techniques include identifying at least one text segment, in a textual representation having a plurality of text segments, having at least one acoustically similar word and/or phrase, annotating the textual representation with disambiguating information to help disambiguate the at least one text segment from the at least one acoustically similar word and/or phrase, and synthesizing a speech signal, at least in part, by performing text-to-speech synthesis on at least a portion of the textual representation that includes the at least one text segment, wherein the speech signal includes speech corresponding to the disambiguating information located proximate the portion of the speech signal corresponding to the at least one text segment.

239 Citations

27 Claims

1. A method comprising:
- identifying at least one text segment, in a textual representation having a plurality of text segments, having at least one acoustically similar word and/or phrase;
  
  annotating the textual representation with disambiguating information to help disambiguate the at least one text segment from the at least one acoustically similar word and/or phrase; and
  
  synthesizing a speech signal, at least in part, by performing text-to-speech synthesis on at least a portion of the textual representation that includes the at least one text segment, wherein the speech signal includes speech corresponding to the disambiguating information located proximate the portion of the speech signal corresponding to the at least one text segment.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, wherein the disambiguating information includes text that helps disambiguate the at least one text segment from the at least one acoustically similar word and/or phrase, and wherein:
    - annotating the textual representation includes inserting the disambiguating information into the textual representation proximate the at least one text segment to form an annotated textual representation; and
      
      synthesizing the speech signal includes synthesizing a speech signal, at least in part, by performing text-to-speech synthesis on at least a portion of the annotated textual representation that includes the at least one text segment and the disambiguating information.
  - 3. The method of claim 1, wherein the disambiguating information includes at least one prerecorded utterance that helps disambiguate the at least one text segment from the at least one acoustically similar word and/or phrase, and wherein:
    - annotating the textual representation includes associating the at least one prerecorded utterance with the at least one text segment; and
      
      synthesizing the speech signal includes inserting the at least one prerecorded utterance into the speech signal proximate the portion of the speech signal corresponding to the at least one text segment.
  - 4. The method of claim 1, wherein the disambiguating information includes an indication of a meaning of the at least one text segment.
  - 5. The method of claim 1, wherein the disambiguating information includes a spelling of the at least one text segment.
  - 6. The method of claim 1, wherein the disambiguating information is represented in the speech signal using a different voice font than at least the at least one text segment.
  - 7. The method of claim 1, further comprising audibly rendering the speech signal to the user.
  - 8. The method of claim 1, wherein identifying at least one text segment having at least one acoustically similar word or phrase includes checking whether any text segment in the textual representation is included in a list comprising acoustically ambiguous words and/or phrases.
  - 9. The method of claim 1, wherein the textual representation corresponds to text converted from speech input from the user by performing automatic speech recognition on the speech input, and wherein automatically identifying at least one text segment having at least one acoustically similar word and/or phrase comprises identifying the at least one text segment based, at least in part, on an N-best list generated during automatic speech recognition.

10. At least one computer readable medium storing instructions that, when executed on at least one processor, perform a method comprising:
- identifying at least one text segment, in a textual representation having a plurality of text segments, having at least one acoustically similar word and/or phrase; and
  
  annotating the textual representation with disambiguating information to help disambiguate the at least one text segment from the at least one acoustically similar word and/or phrase;
  
  synthesizing a speech signal, at least in part, by performing text-to-speech synthesis on at least a portion of the textual representation that includes the at least one text segment, wherein the speech signal includes speech corresponding to the disambiguating information located proximate the portion of the speech signal corresponding to the at least one text segment.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
- - 11. The at least one computer readable medium of claim 10, wherein the disambiguating information includes text that helps disambiguate the at least one text segment from the at least one acoustically similar word and/or phrase, and wherein:
    - annotating the textual representation includes inserting the disambiguating information into the textual representation proximate the at least one text segment to form an annotated textual representation; and
      
      synthesizing the speech signal includes synthesizing a speech signal, at least in part, by performing text-to-speech synthesis on at least a portion of the annotated textual representation that includes the at least one text segment and the disambiguating information.
  - 12. The at least one computer readable medium of claim 10, wherein the disambiguating information includes at least one prerecorded utterance that helps disambiguate the at least one text segment from the at least one acoustically similar word and/or phrase, and wherein:
    - annotating the textual representation includes associating the at least one prerecorded utterance with the at least one text segment; and
      
      synthesizing the speech signal includes inserting the at least one prerecorded utterance into the speech signal proximate the portion of the speech signal corresponding to the at least one text segment.
  - 13. The at least one computer readable medium of claim 10, wherein the disambiguating information includes an indication of a meaning of the at least one text segment.
  - 14. The at least one computer readable medium of claim 10, wherein the disambiguating information includes a spelling of the at least one text segment.
  - 15. The at least one computer readable medium of claim 10, wherein the disambiguating information is represented in the speech signal using a different voice font than at least the at least one text segment.
  - 16. The at least one computer readable medium of claim 10, further comprising audibly rendering the speech signal to the user.
  - 17. The at least one computer readable medium of claim 10, wherein identifying at least one text segment having at least one acoustically similar word or phrase includes checking whether any text segment in the textual representation is included in a list comprising acoustically ambiguous words and/or phrase.
  - 18. The at least one computer readable medium of claim 10, wherein the textual representation corresponds to text converted from speech input from the user by performing automatic speech recognition on the speech input, and wherein automatically identifying at least one text segment having at least one acoustically similar word and/or phrase comprises identifying the at least one text segment based, at least in part, on an N-best list generated during automatic speech recognition.

19. A system comprising:
- at least one input interface for receiving data from the user;
  
  a conversion component configured to convert the data into a textual representation; and
  
  a presentation component configured to provide an audio presentation of at least a portion of the textual representation by performing;
  
  identifying at least one text segment, in a textual representation having a plurality of text segments, having at least one acoustically similar word and/or phrase;
  
  annotating the textual representation with disambiguating information to help disambiguate the at least one text segment from the at least one acoustically similar word and/or phrase;
  
  synthesizing a speech signal, at least in part, by performing text-to-speech synthesis on at least a portion of the textual representation that includes the at least one text segment, wherein the speech signal includes speech corresponding to the disambiguating information located proximate the portion of the speech signal corresponding to the at least one text segment.
- View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27)
- - 20. The system of claim 19, wherein the disambiguating information includes text that helps disambiguate the at least one text segment from the at least one acoustically similar word and/or phrase, and wherein the presentation component is configured to insert the disambiguating information into the textual representation proximate the at least one text segment to form an annotated textual representation, and synthesize the speech signal, at least in part, by performing text-to-speech synthesis on at least a portion of the annotated textual representation that includes the at least one text segment and the disambiguating information.
  - 21. The system of claim 19, wherein the disambiguating information includes at least one prerecorded utterance that helps disambiguate the at least one text segment from the at least one acoustically similar word and/or phrase, and wherein the presentation component is configured to associate the at least one prerecorded utterance with the at least one text segment, and insert the at least one prerecorded utterance into the speech signal proximate the portion of the speech signal corresponding to the at least one text segment.
  - 22. The system of claim 19, wherein the disambiguating information includes an indication of a meaning of the at least one text segment.
  - 23. The system of claim 19, wherein the disambiguating information includes a spelling of the at least one text segment.
  - 24. The system of claim 19, wherein the disambiguating information is represented in the speech signal using a different voice font than at least the at least one text segment.
  - 25. The system of claim 19, further comprising at least one speaker for audibly rendering the speech signal to the user.
  - 26. The system of claim 19, wherein the presentation component is configured to identify at least one text segment having at least one acoustically similar word or phrase, at least in part, by checking whether any text segment in the textual representation is included in a list comprising acoustically ambiguous words and/or phrase.
  - 27. The system of claim 19, wherein the input from the user includes speech, wherein the conversion component includes at least one automatic speech recognition engine to convert the data to the textual representation, and wherein the presentation component is configure to identify at least one text segment having at least one acoustically similar word or phrase based, at least in part, on an N-best list generated by the at least one automatic speech recognition engine.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Labsky, Martin, Kleindienst, Jan, Macek, Tomas, Nahamoo, David, Curin, Jan, Ganong, William F. III

Granted Patent

US 8,954,329 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/260
CPC Class Codes

G06F 40/10   Text processing natural lan...

G06F 40/30   Semantic analysis

G10L 13/00   Speech synthesis; Text to s...

G10L 13/08   Text analysis or generation...

G10L 15/01   Assessment or evaluation of...

G10L 15/02   Feature extraction for spee...

G10L 15/06   Creation of reference templ...

G10L 15/14   using statistical models, e...

G10L 15/1822   Parsing for meaning underst...

G10L 15/26   Speech to text systems G10L...

G10L 15/28   Constructional details of s...

G10L 15/30   Distributed recognition, e....

G10L 15/32   Multiple recognisers used i...

G10L 17/00   Speaker identification or v...

G10L 21/06   Transformation of speech in...

METHODS AND APPARATUS FOR ACOUSTIC DISAMBIGUATION

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

239 Citations

27 Claims

Specification

Solutions

Use Cases

Quick Links

METHODS AND APPARATUS FOR ACOUSTIC DISAMBIGUATION

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

239 Citations

27 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links