GENERATING PERSONALIZED AUDIO PROGRAMS FROM TEXT CONTENT
First Claim
Patent Images
1. A system comprising:
- one or more processors;
a computer-readable memory; and
a module comprising computer executable instructions stored in the memory, wherein the one or more processors, when executing the module, are configured to;
receive, from a client device, user selection data regarding a first content source and a second content source, wherein the first content source is different from the second content source;
retrieve a first content item from the first content source and a second content item from the second content source;
identify first voice data based at least in part on the first content item, wherein the first voice data indicates characteristics of a first voice for text-to-speech synthesis;
identify second voice data based at least in part on the second content item, wherein the second voice data indicates characteristics of a second voice for text-to-speech synthesis, and wherein the second voice data is different from the first voice data;
generate a first audio presentation of the first content item based at least in part on the first voice data;
generate a second audio presentation of the second content item based at least in part on the second voice data;
assemble an audio program comprising the first audio presentation and the second audio presentation; and
transmit the audio program to the client device.
2 Assignments
0 Petitions
Accused Products
Abstract
Features are disclosed for generating text-to-speech (TTS) audio programs from textual content received from multiple sources. A TTS system may assemble an audio program from several individual audio presentations of user-selected network-accessible content. Users may configure the TTS system to retrieve personal content as well as publically accessible content. The audio program may include segues, introductions, summaries, and the like. Voices may be selected for individual content items based on user selections or on characteristics of the content or content source.
-
Citations
30 Claims
-
1. A system comprising:
-
one or more processors; a computer-readable memory; and a module comprising computer executable instructions stored in the memory, wherein the one or more processors, when executing the module, are configured to; receive, from a client device, user selection data regarding a first content source and a second content source, wherein the first content source is different from the second content source; retrieve a first content item from the first content source and a second content item from the second content source; identify first voice data based at least in part on the first content item, wherein the first voice data indicates characteristics of a first voice for text-to-speech synthesis; identify second voice data based at least in part on the second content item, wherein the second voice data indicates characteristics of a second voice for text-to-speech synthesis, and wherein the second voice data is different from the first voice data; generate a first audio presentation of the first content item based at least in part on the first voice data; generate a second audio presentation of the second content item based at least in part on the second voice data; assemble an audio program comprising the first audio presentation and the second audio presentation; and transmit the audio program to the client device. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A computer-implemented method comprising:
-
retrieving a first content item from a first content source and a second content item from a second content source, wherein the first content source is different from the second content source; identifying first text-to-speech voice data based at least in part on a characteristic of the first content item; identifying second text-to-speech voice data based at least in part on a characteristic of the second content item, wherein the first text-to-speech voice data is different from the second text-to-speech voice data; generating a first audio presentation of the first content item utilizing the first text-to-speech voice data; generating a second audio presentation of the second content item utilizing the second text-to-speech voice data; and assembling an audio program comprising the first audio presentation and the second audio presentation. - View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A non-transitory computer readable medium comprising executable code that, when executed by a processor, causes a server computing system comprising one or more computing devices to perform a process comprising:
-
retrieving a first content item from a first content source and a second content item from a second content source, wherein the first content source is different from the second content source; identifying first text-to-speech voice data; generating a first audio presentation of the first content item utilizing the first text-to-speech voice data; generating a second audio presentation of the second content item; and assembling an audio program comprising the first audio presentation and the second audio presentation. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
-
Specification