Methods and apparatus for acoustic disambiguation by insertion of disambiguating textual information
First Claim
1. A method comprising:
- identifying at least one text segment, in a textual representation having a plurality of text segments, having at least one acoustically similar word and/or phrase, wherein the at least one text segment and the at least one acoustically similar word and/or phrase have different spellings;
automatically annotating, using at least one processor, the textual representation with disambiguating information to help disambiguate the at least one text segment from the at least one acoustically similar word and/or phrase;
and synthesizing a speech signal, at least in part by performing text-to-speech synthesis on at least a portion of the textual representation that includes the at least one text segment, wherein the speech signal includes speech corresponding to the disambiguating information located proximate the portion of the speech signal corresponding to the at least one text segment;
wherein the disambiguating information includes text that helps disambiguate the at least one text segment from the at least one acoustically similar word and/or phrase, and wherein;
annotating the textual representation includes inserting the disambiguating information into the textual representation proximate the at least one text segment to form an annotated textual representation;
and synthesizing the speech signal includes synthesizing a speech signal, at least in part, by performing text-to-speech synthesis on at least a portion of the annotated textual representation that includes the at least one text segment and the disambiguating information.
1 Assignment
0 Petitions
Accused Products
Abstract
Techniques for disambiguating at least one text segment from at least one acoustically similar word and/or phrase. The techniques include identifying at least one text segment, in a textual representation having a plurality of text segments, having at least one acoustically similar word and/or phrase which has a different spelling, annotating the textual representation with disambiguating information to help disambiguate the at least one text segment from the at least one acoustically similar word and/or phrase, and synthesizing a speech signal, at least in part, by performing text-to-speech synthesis on at least a portion of the textual representation that includes the at least one text segment, wherein the speech signal includes speech corresponding to the disambiguating information located proximate the portion of the speech signal corresponding to the at least one text segment.
18 Citations
24 Claims
-
1. A method comprising:
-
identifying at least one text segment, in a textual representation having a plurality of text segments, having at least one acoustically similar word and/or phrase, wherein the at least one text segment and the at least one acoustically similar word and/or phrase have different spellings; automatically annotating, using at least one processor, the textual representation with disambiguating information to help disambiguate the at least one text segment from the at least one acoustically similar word and/or phrase; and synthesizing a speech signal, at least in part by performing text-to-speech synthesis on at least a portion of the textual representation that includes the at least one text segment, wherein the speech signal includes speech corresponding to the disambiguating information located proximate the portion of the speech signal corresponding to the at least one text segment; wherein the disambiguating information includes text that helps disambiguate the at least one text segment from the at least one acoustically similar word and/or phrase, and wherein; annotating the textual representation includes inserting the disambiguating information into the textual representation proximate the at least one text segment to form an annotated textual representation; and synthesizing the speech signal includes synthesizing a speech signal, at least in part, by performing text-to-speech synthesis on at least a portion of the annotated textual representation that includes the at least one text segment and the disambiguating information. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. At least one non-transitory computer readable medium storing instructions that, when executed on at least one processor, perform a method comprising:
-
identifying at least one text segment, in a textual representation having a plurality of text segments, having at least one acoustically similar word and/or phrase, wherein the at least one text segment and the at least one acoustically similar word and/or phrase have different spellings; and automatically annotating the textual representation with disambiguating information to help disambiguate the at least one text segment from the at least one acoustically similar word and/or phrase; synthesizing a speech signal, at least in part, by performing text-to-speech synthesis on at least a portion of the textual representation that includes the at least one text segment, wherein the speech signal includes speech corresponding to the disambiguating information located proximate the portion of the speech signal corresponding to the at least one text segment; wherein the disambiguating information includes text that helps disambiguate the at least one text segment from the at least one acoustically similar word and/or phrase, and wherein; annotating the textual representation includes inserting the disambiguating information into the textual representation proximate the at least one text segment to form an annotated textual representation; and synthesizing the speech signal includes synthesizing a speech signal, at least in part, by performing text-to-speech synthesis on at least a portion of the annotated textual representation that includes the at least one text segment and the disambiguating information. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A system comprising:
-
at least one input interface for receiving data from the user; a conversion component configured to convert the data into a textual representation; and a presentation component configured to provide an audio presentation of at least a portion of the textual representation by performing; identifying at least one text segment, in a textual representation having a plurality of text segments, having at least one acoustically similar word and/or phrase, wherein the at least one text segment and the at least one acoustically similar word and/or phrase have different spellings; automatically annotating the textual representation with disambiguating information to help disambiguate the at least one text segment from the at least one acoustically similar word and/or phrase; synthesizing a speech signal, at least in part, by performing text-to-speech synthesis on at least a portion of the textual representation that includes the at least one text segment, wherein the speech signal includes speech corresponding to the disambiguating information located proximate the portion of the speech signal corresponding to the at least one text segment; wherein the disambiguating information includes text that helps disambiguate the at least one text segment from the at least one acoustically similar word and/or phrase, and wherein the presentation component is configured to insert the disambiguating information into the textual representation proximate the at least one text segment to form an annotated textual representation, and synthesize the speech signal, at least in part, by performing text-to-speech synthesis on at least a portion of the annotated textual representation that includes the at least one text segment and the disambiguating information. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24)
-
Specification