Systems and methods for text normalization for text to speech synthesis
First Claim
1. A method for normalizing a text string for text to speech synthesis, the method comprising:
- at a system having one or more processors;
identifying a character sequence in the text string, the character sequence including at least a first non-alphabetical character adjacent one or more alphabetical characters;
identifying two or more alternative alphabetical characters or character strings that correspond to the first non-alphabetical character adjacent the one or more alphabetical characters;
creating a plurality of test strings, each test string being a version of the text string that is modified to include a different one of the identified two or more alternative alphabetical characters or character strings instead of the first non-alphabetical character adjacent the one or more alphabetical characters; and
selecting a first test string from the plurality of test strings to replace the text string in speech synthesis based on respective probabilities of occurrence of the plurality of test strings in a source language of the text string.
1 Assignment
0 Petitions
Accused Products
Abstract
Algorithms for synthesizing speech used to identify media assets are provided. Speech may be selectively synthesized form text strings associated with media assets. A text string may be normalized and its native language determined for obtaining a target phoneme for providing human-sounding speech in a language (e.g., dialect or accent) that is familiar to a user. The algorithms may be implemented on a system including several dedicated render engines. The system may be part of a back end coupled to a front end including storage for media assets and associated synthesized speech, and a request processor for receiving and processing requests that result in providing the synthesized speech. The front end may communicate media assets and associated synthesized speech content over a network to host devices coupled to portable electronic devices on which the media assets and synthesized speech are played back.
582 Citations
24 Claims
-
1. A method for normalizing a text string for text to speech synthesis, the method comprising:
at a system having one or more processors; identifying a character sequence in the text string, the character sequence including at least a first non-alphabetical character adjacent one or more alphabetical characters; identifying two or more alternative alphabetical characters or character strings that correspond to the first non-alphabetical character adjacent the one or more alphabetical characters; creating a plurality of test strings, each test string being a version of the text string that is modified to include a different one of the identified two or more alternative alphabetical characters or character strings instead of the first non-alphabetical character adjacent the one or more alphabetical characters; and selecting a first test string from the plurality of test strings to replace the text string in speech synthesis based on respective probabilities of occurrence of the plurality of test strings in a source language of the text string. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
9. A non-transitory computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by one or more processors, cause the one or more processors to:
-
identify a character sequence in the text string, the character sequence including at least a first non-alphabetical character adjacent one or more alphabetical characters; identify two or more alternative alphabetical characters or character strings that correspond to the first non-alphabetical character adjacent the one or more alphabetical characters; create a plurality of test strings, each test string being a version of the text string that is modified to include a different one of the identified two or more alternative alphabetical characters or character strings instead of the first non-alphabetical character adjacent the one or more alphabetical characters; select a first test string from the plurality of test strings to replace the text string in speech synthesis based on respective probabilities of occurrence of the plurality of test strings in a source language of the text string. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A system, comprising:
-
one or more processors; and memory, the memory storing one or more programs, the one or more programs comprising instructions, which when executed by the one or more processors, cause the one or more processors to; identify a character sequence in the text string, the character sequence including at least a first non-alphabetical character adjacent one or more alphabetical characters; identify two or more alternative alphabetical characters or character strings that correspond to the first non-alphabetical character adjacent the one or more alphabetical characters; create a plurality of test strings, each test string being a version of the text string that is modified to include a different one of the identified two or more alternative alphabetical characters or character strings instead of the first non-alphabetical character adjacent the one or more alphabetical characters; select a first test string from the plurality of test strings to replace the text string in speech synthesis based on respective probabilities of occurrence of the plurality of test strings in a source language of the text string. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24)
-
Specification