Systems and methods for selective rate of speech and speech preferences for text to speech synthesis
First Claim
1. A method for customizing delivery of synthesized speech, the method comprising:
- generating a speech segment from one or more text strings describing or identifying a media asset having audio data distinct from the generated speech segment;
obtaining user input requesting a variation in speech delivery accompanying the media asset;
in response to the user input, customizing the speech segment by modifying selected portions of the speech segment at a server device, wherein the customizing further comprises;
automatically detecting one or more repeated portions in the speech segment; and
automatically modifying the speech segment by performing one or more of;
(1) omitting at least one of the repeated portions from the speech segment, (2) using faster speech patterns for at least one of the repeated portions, (3) shortening breaks between words in at least one of the repeated portions, and (4) truncating one or more phrases in at least one of the repeated portions; and
providing the customized speech segment from the server device to a user device for playback with the media asset.
1 Assignment
0 Petitions
Accused Products
Abstract
Algorithms for synthesizing speech used to identify media assets are provided. Speech may be selectively synthesized form text strings associated with media assets. A text string may be normalized and its native language determined for obtaining a target phoneme for providing human-sounding speech in a language (e.g., dialect or accent) that is familiar to a user. The algorithms may be implemented on a system including several dedicated render engines. The system may be part of a back end coupled to a front end including storage for media assets and associated synthesized speech, and a request processor for receiving and processing requests that result in providing the synthesized speech. The front end may communicate media assets and associated synthesized speech content over a network to host devices coupled to portable electronic devices on which the media assets and synthesized speech are played back.
663 Citations
30 Claims
-
1. A method for customizing delivery of synthesized speech, the method comprising:
-
generating a speech segment from one or more text strings describing or identifying a media asset having audio data distinct from the generated speech segment; obtaining user input requesting a variation in speech delivery accompanying the media asset; in response to the user input, customizing the speech segment by modifying selected portions of the speech segment at a server device, wherein the customizing further comprises; automatically detecting one or more repeated portions in the speech segment; and automatically modifying the speech segment by performing one or more of;
(1) omitting at least one of the repeated portions from the speech segment, (2) using faster speech patterns for at least one of the repeated portions, (3) shortening breaks between words in at least one of the repeated portions, and (4) truncating one or more phrases in at least one of the repeated portions; andproviding the customized speech segment from the server device to a user device for playback with the media asset. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A non-transitory computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by one or more processors, cause the one or more processors to:
-
generate a speech segment from one or more text strings describing or identifying a media asset having audio data distinct from the generated speech segment; obtain user input requesting a variation in speech delivery accompanying the media asset; in response to the user input, customize the speech segment by modifying selected portions of the speech segment at a server device, wherein the customizing further comprises; automatically detecting one or more repeated portions in the speech segment; and automatically modifying the speech segment by performing one or more of;
(1) omitting at least one of the repeated portions from the speech segment, (2) using faster speech patterns for at least one of the repeated portions, (3) shortening breaks between words in at least one of the repeated portions, and (4) truncating one or more phrases in at least one of the repeated portions; andprovide the customized speech segment from the server device to a user device for playback with the media asset. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A system, comprising:
-
one or more processors; and memory, the memory storing one or more programs, the one or more programs comprising instructions, which when executed by the one or more processors, cause the one or more processors to; generate a speech segment from one or more text strings describing or identifying a media asset having audio data distinct from the generated speech segment; obtain user input requesting a variation in speech delivery accompanying the media asset; in response to the user input, customize the speech segment by modifying selected portions of the speech segment at a server device, wherein the customizing further comprises; automatically detecting one or more repeated portions in the speech segment; and automatically modifying the speech segment by performing one or more of;
(1) omitting at least one of the repeated portions from the speech segment, (2) using faster speech patterns for at least one of the repeated portions, (3) shortening breaks between words in at least one of the repeated portions, and (4) truncating one or more phrases in at least one of the repeated portions; andprovide the customized speech segment from the server device to a user device for playback with the media asset. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27)
-
-
28. A method for customizing delivery of synthesized speech, the method comprising:
-
generating a speech segment from one or more text strings associated with or identifying a media asset; obtaining user input requesting a variation in speech delivery accompanying the media asset; in response to the user input, customizing the speech segment by modifying selected portions of the speech segment at a server device; and providing the customized speech segment from the server device to a user device for playback with the media asset, wherein customizing the speech segment by modifying selected portions of the speech segment further comprises; automatically detecting one or more repeated portions in the speech segment; and automatically modifying the speech segment by performing one or more of;
(1) omitting at least one of the repeated portions from the speech segment, (2) using faster speech patterns for at least one of the repeated portions, (3) shortening breaks between words in at least one of the repeated portions, and (4) truncating one or more phrases in at least one of the repeated portions.
-
-
29. A non-transitory computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by one or more processors, cause the one or more processors to:
-
generate a speech segment from one or more text strings associated with or identifying a media asset; obtain user input requesting a variation in speech delivery accompanying the media asset; in response to the user input, customize the speech segment by modifying selected portions of the speech segment at a server device; and provide the customized speech segment from the server device to a user device for playback with the media asset, wherein customizing the speech segment by modifying selected portions of the speech segment further comprises; automatically detecting one or more repeated portions in the speech segment; and automatically modifying the speech segment by performing one or more of;
(1) omitting at least one of the repeated portions from the speech segment, (2) using faster speech patterns for at least one of the repeated portions, (3) shortening breaks between words in at least one of the repeated portions, and (4) truncating one or more phrases in at least one of the repeated portions.
-
-
30. A system, comprising:
-
one or more processors; and memory, the memory storing one or more programs, the one or more programs comprising instructions, which when executed by the one or more processors, cause the one or more processors to; generate a speech segment from one or more text strings associated with or identifying a media asset; obtain user input requesting a variation in speech delivery accompanying the media asset; in response to the user input, customize the speech segment by modifying selected portions of the speech segment at a server device; and provide the customized speech segment from the server device to a user device for playback with the media asset, wherein customizing the speech segment by modifying selected portions of the speech segment further comprises; automatically detecting one or more repeated portions in the speech segment; and automatically modifying the speech segment by performing one or more of;
(1) omitting at least one of the repeated portions from the speech segment, (2) using faster speech patterns for at least one of the repeated portions, (3) shortening breaks between words in at least one of the repeated portions, and (4) truncating one or more phrases in at least one of the repeated portions.
-
Specification