SYSTEMS AND METHODS FOR TEXT TO SPEECH SYNTHESIS
First Claim
1. A voice synthesis server farm system comprising:
- a plurality of rendering servers, wherein each of the plurality of rendering servers comprises at least one render engine that is operable to convert text associated with a media asset into audio, wherein the audio comprises a human-sounding rendering of the text that is spoken in a language of a user regardless of a language in which the text originated.
1 Assignment
0 Petitions
Accused Products
Abstract
Algorithms for synthesizing speech used to identify media assets are provided. Speech may be selectively synthesized form text strings associated with media assets. A text string may be normalized and its native language determined for obtaining a target phoneme for providing human-sounding speech in a language (e.g., dialect or accent) that is familiar to a user. The algorithms may be implemented on a system including several dedicated render engines. The system may be part of a back end coupled to a front end including storage for media assets and associated synthesized speech, and a request processor for receiving and processing requests that result in providing the synthesized speech. The front end may communicate media assets and associated synthesized speech content over a network to host devices coupled to portable electronic devices on which the media assets and synthesized speech are played back.
410 Citations
16 Claims
-
1. A voice synthesis server farm system comprising:
a plurality of rendering servers, wherein each of the plurality of rendering servers comprises at least one render engine that is operable to convert text associated with a media asset into audio, wherein the audio comprises a human-sounding rendering of the text that is spoken in a language of a user regardless of a language in which the text originated. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
10. A method for synthesizing speech from content related to a media asset, the method comprising:
-
receiving a request for a rendering of text related to the media asset; and converting the text associated with the media asset into audio, the audio comprising a human-sounding rendering of the text that is spoken in a language of a user regardless of a language in which the text originated. - View Dependent Claims (11, 12)
-
-
13. A method for processing requests for synthesized speech, the method comprising:
-
receiving a request for a media asset or a request for speech content identifying the media asset; if a rendering of speech content is available, retrieving and providing the rendering; and if a rendering of speech content is not available, converting text associated with the media asset into audio, the audio comprising a human-sounding rendering of the text that is spoken in a native language of a user regardless of a language in which the text originated.
-
-
14. A method for requesting synthesized speech from content related to a media asset, the method comprising:
-
generating a request for a media asset or a request for speech content identifying the media asset; and receiving, in response to the generated request a rendering of speech content, the rendering comprising text associated with the media asset, whereby the text is converted into audio, the audio comprising a human-sounding rendering of the text that is spoken in a language of a user regardless of a language in which the text originated. - View Dependent Claims (15, 16)
-
Specification