Generating personalized audio programs from text content
First Claim
Patent Images
1. A system comprising:
- one or more processors;
a computer-readable memory; and
a module comprising computer executable instructions stored in the memory, wherein the one or more processors, when executing the module, are configured to;
receive, from a client device, user selection data regarding a first content source and a second content source, wherein the first content source is different from the second content source;
retrieve a first content item from the first content source and a second content item from the second content source;
determine, based at least in part on an association between a characteristic of the first content item and a characteristic of first voice data, to use the first voice data to generate a first text-to-speech presentation of the first content item;
determine, based at least in part on an association between a characteristic of the second content item and a characteristic of second voice data, to use the second voice data to generate a second text-to-speech presentation of the second content item;
generate the first text-to-speech presentation of the first content item based at least in part on the first voice data;
generate the second text-to-speech presentation of the second content item based at least in part on the second voice data;
assemble an audio program comprising the first text-to-speech presentation and the second text-to-speech presentation; and
transmit the audio program to the client device.
2 Assignments
0 Petitions
Accused Products
Abstract
Features are disclosed for generating text-to-speech (TTS) audio programs from textual content received from multiple sources. A TTS system may assemble an audio program from several individual audio presentations of user-selected network-accessible content. Users may configure the TTS system to retrieve personal content as well as publically accessible content. The audio program may include segues, introductions, summaries, and the like. Voices may be selected for individual content items based on user selections or on characteristics of the content or content source.
31 Citations
31 Claims
-
1. A system comprising:
-
one or more processors; a computer-readable memory; and a module comprising computer executable instructions stored in the memory, wherein the one or more processors, when executing the module, are configured to; receive, from a client device, user selection data regarding a first content source and a second content source, wherein the first content source is different from the second content source; retrieve a first content item from the first content source and a second content item from the second content source; determine, based at least in part on an association between a characteristic of the first content item and a characteristic of first voice data, to use the first voice data to generate a first text-to-speech presentation of the first content item; determine, based at least in part on an association between a characteristic of the second content item and a characteristic of second voice data, to use the second voice data to generate a second text-to-speech presentation of the second content item; generate the first text-to-speech presentation of the first content item based at least in part on the first voice data; generate the second text-to-speech presentation of the second content item based at least in part on the second voice data; assemble an audio program comprising the first text-to-speech presentation and the second text-to-speech presentation; and transmit the audio program to the client device. - View Dependent Claims (2, 3, 4, 5, 27, 28, 29)
-
-
6. A computer-implemented method comprising:
-
retrieving a first content item from a first content source and a second content item from a second content source, wherein the first content source is different from the second content source; identifying first text-to-speech voice data based at least in part on a characteristic of the first content item; determining that the second content item comprises a first portion and a second portion; identifying second text-to-speech voice data and third text-to-speech voice data based at least in part on a characteristic of the second content item, wherein the first text-to-speech voice data is different from the second text-to-speech voice data; generating a first audio presentation of the first content item utilizing the first text-to-speech voice data; generating a second audio presentation of the second content item utilizing the second text-to-speech voice data with the first portion, and using the third text-to-speech voice data with the second portion; and assembling an audio program comprising the first audio presentation and the second audio presentation. - View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A non-transitory computer readable medium comprising executable code that, when executed by a processor, causes a server computing system comprising one or more computing devices to perform a process comprising:
-
retrieving a first content item from a first content source and a second content item from a second content source, wherein the first content source is different from the second content source; identifying first text-to-speech voice data based at least partly on an association between the first text-to-speech voice data and a characteristic of the first content item; generating a first audio presentation of the first content item utilizing the first text-to-speech voice data; identifying second text-to-speech voice data based at least partly on an association between the second text-to-speech voice data and a characteristic of the second content item; generating a second audio presentation of the second content item utilizing second text-to-speech voice data; and assembling an audio program comprising the first audio presentation and the second audio presentation. - View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26, 30, 31)
-
Specification