Systems and methods for selective text to speech synthesis
First Claim
1. A method for selectively synthesizing speech based on a text string, comprising:
- at a device having one or more processors and memory;
generating the text string from metadata associated with a media asset;
parsing the text string and identifying one or more portions of the text string each providing information of a respective attribute associated with or identifying the media asset;
substituting at least a first portion of the text string that provides respective information of a first attribute of the media asset with text providing respective information of a second attribute of the media asset different from the first attribute of the media asset, where the first attribute of the media asset and the second attribute of the media asset have been selected according to a genre-dependent rule and a respective genre associated with the media asset; and
synthesizing speech for provision with the media asset based on the text string after the substitution.
1 Assignment
0 Petitions
Accused Products
Abstract
Algorithms for synthesizing speech used to identify media assets are provided. Speech may be selectively synthesized form text strings associated with media assets. A text string may be normalized and its native language determined for obtaining a target phoneme for providing human-sounding speech in a language (e.g., dialect or accent) that is familiar to a user. The algorithms may be implemented on a system including several dedicated render engines. The system may be part of a back end coupled to a front end including storage for media assets and associated synthesized speech, and a request processor for receiving and processing requests that result in providing the synthesized speech. The front end may communicate media assets and associated synthesized speech content over a network to host devices coupled to portable electronic devices on which the media assets and synthesized speech are played back.
853 Citations
18 Claims
-
1. A method for selectively synthesizing speech based on a text string, comprising:
at a device having one or more processors and memory; generating the text string from metadata associated with a media asset; parsing the text string and identifying one or more portions of the text string each providing information of a respective attribute associated with or identifying the media asset; substituting at least a first portion of the text string that provides respective information of a first attribute of the media asset with text providing respective information of a second attribute of the media asset different from the first attribute of the media asset, where the first attribute of the media asset and the second attribute of the media asset have been selected according to a genre-dependent rule and a respective genre associated with the media asset; and synthesizing speech for provision with the media asset based on the text string after the substitution. - View Dependent Claims (2, 3, 4, 5, 6)
-
7. A non-transitory computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by one or more processors, cause the one or more processors to:
-
generate a text string from metadata associated with a media asset; parse the text string and identify one or more portions of the text string each providing information of a respective attribute associated with or identifying the media asset; substitute at least a first portion of the text string that provides respective information of a first attribute of the media asset with text providing respective information of a second attribute of the media asset different from the first attribute of the media asset, where the first attribute of the media asset and the second attribute of the media asset have been selected according to a genre-dependent rule and a respective genre associated with the media asset; and synthesize speech for provision with the media asset based on the text string after the substitution. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A system, comprising:
-
one or more processors; and memory, the memory storing one or more programs, the one or more programs comprising instructions, which when executed by the one or more processors, cause the one or more processors to; generate a text string from metadata associated with a media asset; parse the text string and identify one or more portions of the text string each providing information of a respective attribute associated with or identifying the media asset; substitute at least a first portion of the text string that provides respective information of a first attribute of the media asset with text providing respective information of a second attribute of the media asset different from the first attribute of the media asset, where the first attribute of the media asset and the second attribute of the media asset have been selected according to a genre-dependent rule and a respective genre associated with the media asset; and synthesize speech for provision with the media asset based on the text string after the substitution. - View Dependent Claims (14, 15, 16, 17, 18)
-
Specification