Systems and methods for concatenation of words in text to speech synthesis
First Claim
1. A method for concatenating words in a text string, performed at an electronic device having one or more processors and memory storing one or more programs for execution by the one or more processors, the method comprising:
- obtaining phonemes for a text string, the text string comprising at least a preceding word and a succeeding word to be concatenated;
identifying a last letter of the preceding word to be concatenated, and identifying a first letter of the succeeding word to be concatenated;
selecting a connector term and a connector term type based on the identified last letter and the identified first letter; and
creating a modified text string for speech synthesis including the selected connector term and the selected connector type.
1 Assignment
0 Petitions
Accused Products
Abstract
Algorithms for synthesizing speech used to identify media assets are provided. Speech may be selectively synthesized form text strings associated with media assets. A text string may be normalized and its native language determined for obtaining a target phoneme for providing human-sounding speech in a language (e.g., dialect or accent) that is familiar to a user. The algorithms may be implemented on a system including several dedicated render engines. The system may be part of a back end coupled to a front end including storage for media assets and associated synthesized speech, and a request processor for receiving and processing requests that result in providing the synthesized speech. The front end may communicate media assets and associated synthesized speech content over a network to host devices coupled to portable electronic devices on which the media assets and synthesized speech are played back.
619 Citations
36 Claims
-
1. A method for concatenating words in a text string, performed at an electronic device having one or more processors and memory storing one or more programs for execution by the one or more processors, the method comprising:
-
obtaining phonemes for a text string, the text string comprising at least a preceding word and a succeeding word to be concatenated; identifying a last letter of the preceding word to be concatenated, and identifying a first letter of the succeeding word to be concatenated; selecting a connector term and a connector term type based on the identified last letter and the identified first letter; and creating a modified text string for speech synthesis including the selected connector term and the selected connector type. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A non-transitory computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by one or more processors, cause the one or more processors to perform operations comprising:
-
obtaining phonemes for a text string, the text string comprising at least a preceding word and a succeeding word to be concatenated; identifying a last letter of the preceding word to be concatenated, and identifying a first letter of the succeeding word to be concatenated; selecting a connector term and a connector term type based on the identified last letter and the identified first letter; and creating a modified text string for speech synthesis including the selected connector term and the selected connector type. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
-
-
25. A system, comprising:
-
one or more processors; and memory, the memory storing one or more programs, the one or more programs comprising instructions, which when executed by one or more processors, cause the one or more processors to perform operations comprising; obtaining phonemes for a text string, the text string comprising at least a preceding word and a succeeding word to be concatenated; identifying a last letter of the preceding word to be concatenated, and identifying a first letter of the succeeding word to be concatenated; selecting a connector term and a connector term type based on the identified last letter and the identified first letter; and creating a modified text string for speech synthesis including the selected connector term and the selected connector type. - View Dependent Claims (26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36)
-
Specification