Systems and methods for text to speech synthesis
First Claim
1. A method for synthesizing speech from content related to a media asset, the method comprising:
- receiving a request for a rendering of text associated with the media asset; and
converting the text associated with the media asset into speech, the speech comprising a rendering of the text that is spoken in a native language of the text and customized with an accent associated with a user, wherein converting the text associated with the media asset into speech further comprises;
obtaining a plurality of native phonemes of the text;
determining the accent associated the user;
mapping the plurality of native phonemes to a plurality of target phonemes associated with the accent; and
generating the speech using the plurality of target phonemes.
1 Assignment
0 Petitions
Accused Products
Abstract
Algorithms for synthesizing speech used to identify media assets are provided. Speech may be selectively synthesized form text strings associated with media assets. A text string may be normalized and its native language determined for obtaining a target phoneme for providing human-sounding speech in a language (e.g., dialect or accent) that is familiar to a user. The algorithms may be implemented on a system including several dedicated render engines. The system may be part of a back end coupled to a front end including storage for media assets and associated synthesized speech, and a request processor for receiving and processing requests that result in providing the synthesized speech. The front end may communicate media assets and associated synthesized speech content over a network to host devices coupled to portable electronic devices on which the media assets and synthesized speech are played back.
620 Citations
21 Claims
-
1. A method for synthesizing speech from content related to a media asset, the method comprising:
-
receiving a request for a rendering of text associated with the media asset; and converting the text associated with the media asset into speech, the speech comprising a rendering of the text that is spoken in a native language of the text and customized with an accent associated with a user, wherein converting the text associated with the media asset into speech further comprises; obtaining a plurality of native phonemes of the text; determining the accent associated the user; mapping the plurality of native phonemes to a plurality of target phonemes associated with the accent; and generating the speech using the plurality of target phonemes. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A non-transitory computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by one or more processors, cause the one or more processors to:
-
receive a request for a rendering of text associated with a media asset; and convert the text associated with the media asset into speech, the speech comprising a rendering of the text that is spoken in a native language of the text and customized with an accent associated with a user, wherein converting the text associated with the media asset into speech further comprises; obtaining a plurality of native phonemes of the text; determining the accent associated the user; mapping the plurality of native phonemes to a plurality of target phonemes associated with the accent; and generating the speech using the plurality of target phonemes. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A system, comprising:
-
one or more processors; and memory, the memory storing one or more programs, the one or more programs comprising instructions, which when executed by the one or more processors, cause the one or more processors to; receive a request for a rendering of text associated with a media asset; and convert the text associated with the media asset into speech, the speech comprising a rendering of the text that is spoken in a native language of the text and customized with an accent associated with a user, wherein converting the text associated with the media asset into speech further comprises; obtaining a plurality of native phonemes of the text; determining the accent associated the user; mapping the plurality of native phonemes to a plurality of target phonemes associated with the accent; and generating the speech using the plurality of target phonemes. - View Dependent Claims (16, 17, 18, 19, 20, 21)
-
Specification