Generating personalized audio programs from text content

US 9,190,049 B2
Filed: 12/19/2012
Issued: 11/17/2015
Est. Priority Date: 10/25/2012
Status: Active Grant

First Claim

Patent Images

1. A system comprising:

one or more processors;

a computer-readable memory; and

a module comprising computer executable instructions stored in the memory, wherein the one or more processors, when executing the module, are configured to;

receive, from a client device, user selection data regarding a first content source and a second content source, wherein the first content source is different from the second content source;

retrieve a first content item from the first content source and a second content item from the second content source;

determine, based at least in part on an association between a characteristic of the first content item and a characteristic of first voice data, to use the first voice data to generate a first text-to-speech presentation of the first content item;

determine, based at least in part on an association between a characteristic of the second content item and a characteristic of second voice data, to use the second voice data to generate a second text-to-speech presentation of the second content item;

generate the first text-to-speech presentation of the first content item based at least in part on the first voice data;

generate the second text-to-speech presentation of the second content item based at least in part on the second voice data;

assemble an audio program comprising the first text-to-speech presentation and the second text-to-speech presentation; and

transmit the audio program to the client device.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Features are disclosed for generating text-to-speech (TTS) audio programs from textual content received from multiple sources. A TTS system may assemble an audio program from several individual audio presentations of user-selected network-accessible content. Users may configure the TTS system to retrieve personal content as well as publically accessible content. The audio program may include segues, introductions, summaries, and the like. Voices may be selected for individual content items based on user selections or on characteristics of the content or content source.

31 Citations

View as Search Results

31 Claims

1. A system comprising:
- one or more processors;
  
  a computer-readable memory; and
  
  a module comprising computer executable instructions stored in the memory, wherein the one or more processors, when executing the module, are configured to;
  
  receive, from a client device, user selection data regarding a first content source and a second content source, wherein the first content source is different from the second content source;
  
  retrieve a first content item from the first content source and a second content item from the second content source;
  
  determine, based at least in part on an association between a characteristic of the first content item and a characteristic of first voice data, to use the first voice data to generate a first text-to-speech presentation of the first content item;
  
  determine, based at least in part on an association between a characteristic of the second content item and a characteristic of second voice data, to use the second voice data to generate a second text-to-speech presentation of the second content item;
  
  generate the first text-to-speech presentation of the first content item based at least in part on the first voice data;
  
  generate the second text-to-speech presentation of the second content item based at least in part on the second voice data;
  
  assemble an audio program comprising the first text-to-speech presentation and the second text-to-speech presentation; and
  
  transmit the audio program to the client device.
- View Dependent Claims (2, 3, 4, 5, 27, 28, 29)
- - 2. The system of claim 1, wherein the one or more processors are further configured to include, in the audio program, a segue between the first text-to-speech presentation and the second text-to-speech presentation, the segue comprising user-selected music.
  - 3. The system of claim 1, wherein the one or more processors are further configured to:
    - generate an audio presentation of a summarization of the audio program,wherein the audio program further comprises the audio presentation.
  - 4. The system of claim 1, wherein the one or more processors are further configured to:
    - receive, from the client device, authentication information associated with the first content source,wherein the authentication information is presented to the first content source to retrieve the first content item.
  - 5. The system of claim 1, wherein a characteristic of the first voice data comprises at least one of an age of a speaker, a gender of the speaker, or a speaking rate of the speaker.
  - 27. The system of claim 1, wherein the association between the characteristic of the first content item and the characteristic of the first voice data comprises a previous determination that a text-to-speech presentation of a content item having the characteristic of the first content item is to be generated using a text-to-speech voice having the characteristic of the first voice data.
  - 28. The system of claim 1, wherein the one or more processors are further configured to determine the characteristic of the first content item by analyzing at least one of:
    - textual content of the first content item, data regarding the first content source, or data regarding an author of the first content item.
  - 29. The system of claim 1, wherein the characteristic of the first content item comprises at least one of a subject matter, a vocabulary, a length, a source, or an author.

6. A computer-implemented method comprising:
- retrieving a first content item from a first content source and a second content item from a second content source, wherein the first content source is different from the second content source;
  
  identifying first text-to-speech voice data based at least in part on a characteristic of the first content item;
  
  determining that the second content item comprises a first portion and a second portion;
  
  identifying second text-to-speech voice data and third text-to-speech voice data based at least in part on a characteristic of the second content item, wherein the first text-to-speech voice data is different from the second text-to-speech voice data;
  
  generating a first audio presentation of the first content item utilizing the first text-to-speech voice data;
  
  generating a second audio presentation of the second content item utilizing the second text-to-speech voice data with the first portion, and using the third text-to-speech voice data with the second portion; and
  
  assembling an audio program comprising the first audio presentation and the second audio presentation.
- View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
- - 7. The computer-implemented method of claim 6, wherein the second content item comprises a quotation, wherein the first portion does not comprise the quotation, and wherein the second portion comprises the quotation.
  - 8. The computer-implemented method of claim 6, wherein the second content item comprises an interview, wherein the first portion corresponds to an interviewer, and wherein the second portion corresponds to an interviewee.
  - 9. The computer-implemented method of claim 6, wherein the audio program comprises streaming audio and wherein the streaming audio comprises the first audio presentation and the second audio presentation.
  - 10. The computer-implemented method of claim 6, wherein assembling the audio program comprises placing a segue between the first audio presentation and the second audio presentation.
  - 11. The computer-implemented method of claim 10, wherein the segue comprises at least a portion of a music recording, and wherein the portion is obtained from a client device or from a network-accessible music server.
  - 12. The computer-implemented method of claim 6, wherein assembling the audio program comprises:
    - determining a summary of the audio program;
      
      generating a third audio presentation of the summary; and
      
      including the third audio presentation in the audio program.
  - 13. The computer-implemented method of claim 6, further comprising:
    - receiving, from a client device, authentication information associated with the first content source,wherein retrieving the first content item comprises presenting the authentication information to the first content source.
  - 14. The computer-implemented method of claim 6, wherein the first characteristic comprises at least one of a subject matter, a vocabulary, a length, a source, or an author.
  - 15. The computer-implemented method of claim 6, further comprising:
    - identifying a speaker gender, a speaker age, or a speaker voice speed based at least in part on the characteristic of the first content item,wherein identifying the first text-to-speech voice data is further based at least in part on the speaker gender, speaker age, or speaker voice speed.
  - 16. The computer-implemented method of claim 6, wherein generating a first audio presentation of the first content item comprises:
    - summarizing the first content item, wherein the summarization is based on natural language understanding (NLU); and
      
      generating a first audio presentation of the summarization.
  - 17. The computer-implemented method of claim 6, further comprising:
    - receiving tag data from a client device, wherein the tag data indicates a content item to tag; and
      
      tagging the content item indicated by the tag data.

18. A non-transitory computer readable medium comprising executable code that, when executed by a processor, causes a server computing system comprising one or more computing devices to perform a process comprising:
- retrieving a first content item from a first content source and a second content item from a second content source, wherein the first content source is different from the second content source;
  
  identifying first text-to-speech voice data based at least partly on an association between the first text-to-speech voice data and a characteristic of the first content item;
  
  generating a first audio presentation of the first content item utilizing the first text-to-speech voice data;
  
  identifying second text-to-speech voice data based at least partly on an association between the second text-to-speech voice data and a characteristic of the second content item;
  
  generating a second audio presentation of the second content item utilizing second text-to-speech voice data; and
  
  assembling an audio program comprising the first audio presentation and the second audio presentation.
- View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26, 30, 31)
- - 19. The non-transitory computer readable medium of claim 18 wherein the first content item and the second content item are retrieved based at least in part on user selection data.
  - 20. The non-transitory computer readable medium of claim 18, wherein the characteristic of the first content item comprises one of a subject matter, a vocabulary, a length, a source, or an author.
  - 21. The non-transitory computer readable medium of claim 19, wherein the association between the first text-to-speech voice data and the characteristic of the first content item comprises a previous determination that a text-to-speech presentation of a content item having the characteristic of the first content item is to be generated using a text-to-speech voice having a voice characteristic of the first text-to-speech voice data.
  - 22. The non-transitory computer readable medium of claim 18, further comprising:
    - identifying second text-to-speech voice data and third text-to-speech voice data based at least in part on a characteristic of the second content item;
      
      in response to determining that the second text-to-speech voice data comprises the first text-to-speech voice data, generating the second audio presentation based at least in part on the third text-to-speech voice data; and
      
      in response to determining that the second text-to-speech voice data does not comprise the first text-to-speech voice data, generating the second audio presentation based at least in part on the second text-to-speech voice data.
  - 23. The non-transitory computer readable medium of claim 18, wherein assembling the audio program comprises placing a segue between the first audio presentation and the second audio presentation.
  - 24. The non-transitory computer readable medium of claim 23, wherein the segue comprises at least a portion of a music recording, and wherein the portion is obtained from the client device or from a network-accessible music server.
  - 25. The non-transitory computer readable medium of claim 18, wherein assembling the audio program comprises:
    - determining a summary of audio program;
      
      generating a third audio presentation of the summary; and
      
      including the third audio presentation in the audio program.
  - 26. The non-transitory computer readable medium of claim 18, further comprising:
    - receiving, from a client device, first authentication information associated with the first content source,wherein retrieving the first content item comprises presenting the authentication information to the first content source.
  - 30. The non-transitory computer readable medium of claim 21, wherein the voice characteristic comprises one of an age of a speaker, a gender of the speaker, or a speaking rate of the speaker.
  - 31. The non-transitory computer readable medium of claim 18, wherein the executable code further causes the server computing system to perform a process comprising determining the characteristic of the first content item by analyzing at least one of:
    - textual content of the first content item, data regarding the first content source, or data regarding an author of the first content item.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
IVONA Software Sp zoo (Amazon.com, Inc.)
Inventors
Kaszczuk, Michal T., Osowski, Lukasz M.
Primary Examiner(s)
GUERRA-ERAZO, EDGAR X

Application Number

US13/720,873
Publication Number

US 20140122079A1
Time in Patent Office

1,063 Days
Field of Search

704/258, 704/260, 704/261, 704/270, 704/270.1, 704/275, 704/277
US Class Current

1/1
CPC Class Codes

G10L 13/02   Methods for producing synth...

G10L 13/033   Voice editing, e.g. manipul...

G10L 13/08   Text analysis or generation...

Generating personalized audio programs from text content

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

31 Citations

31 Claims

Specification

Solutions

Use Cases

Quick Links

Generating personalized audio programs from text content

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

31 Citations

31 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links