GENERATING PERSONALIZED AUDIO PROGRAMS FROM TEXT CONTENT

US 20140122079A1
Filed: 12/19/2012
Published: 05/01/2014
Est. Priority Date: 10/25/2012
Status: Active Grant

First Claim

Patent Images

1. A system comprising:

one or more processors;

a computer-readable memory; and

a module comprising computer executable instructions stored in the memory, wherein the one or more processors, when executing the module, are configured to;

receive, from a client device, user selection data regarding a first content source and a second content source, wherein the first content source is different from the second content source;

retrieve a first content item from the first content source and a second content item from the second content source;

identify first voice data based at least in part on the first content item, wherein the first voice data indicates characteristics of a first voice for text-to-speech synthesis;

identify second voice data based at least in part on the second content item, wherein the second voice data indicates characteristics of a second voice for text-to-speech synthesis, and wherein the second voice data is different from the first voice data;

generate a first audio presentation of the first content item based at least in part on the first voice data;

generate a second audio presentation of the second content item based at least in part on the second voice data;

assemble an audio program comprising the first audio presentation and the second audio presentation; and

transmit the audio program to the client device.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Features are disclosed for generating text-to-speech (TTS) audio programs from textual content received from multiple sources. A TTS system may assemble an audio program from several individual audio presentations of user-selected network-accessible content. Users may configure the TTS system to retrieve personal content as well as publically accessible content. The audio program may include segues, introductions, summaries, and the like. Voices may be selected for individual content items based on user selections or on characteristics of the content or content source.

Citations

30 Claims

1. A system comprising:
- one or more processors;
  
  a computer-readable memory; and
  
  a module comprising computer executable instructions stored in the memory, wherein the one or more processors, when executing the module, are configured to;
  
  receive, from a client device, user selection data regarding a first content source and a second content source, wherein the first content source is different from the second content source;
  
  retrieve a first content item from the first content source and a second content item from the second content source;
  
  identify first voice data based at least in part on the first content item, wherein the first voice data indicates characteristics of a first voice for text-to-speech synthesis;
  
  identify second voice data based at least in part on the second content item, wherein the second voice data indicates characteristics of a second voice for text-to-speech synthesis, and wherein the second voice data is different from the first voice data;
  
  generate a first audio presentation of the first content item based at least in part on the first voice data;
  
  generate a second audio presentation of the second content item based at least in part on the second voice data;
  
  assemble an audio program comprising the first audio presentation and the second audio presentation; and
  
  transmit the audio program to the client device.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The system of claim 1, wherein the audio presentation further comprises a segue between the first audio presentation and the second presentation, the segue comprising user-selected music.
  - 3. The system of claim 1, wherein the one or more processors are further configured to:
    - generate a third audio presentation of a summarization of the audio program,wherein the audio program further comprises the third audio presentation.
  - 4. The system of claim 1, wherein the one or more processors are further configured to:
    - receive, from the client device, authentication information associated with the first content source,wherein the authentication information is presented to the first content source to retrieve the first content item.
  - 5. The system of claim 1, wherein a characteristic of the first voice comprises at least one of an age of a speaker, a gender of the speaker, or a speaking rate of the speaker.

6. A computer-implemented method comprising:
- retrieving a first content item from a first content source and a second content item from a second content source, wherein the first content source is different from the second content source;
  
  identifying first text-to-speech voice data based at least in part on a characteristic of the first content item;
  
  identifying second text-to-speech voice data based at least in part on a characteristic of the second content item, wherein the first text-to-speech voice data is different from the second text-to-speech voice data;
  
  generating a first audio presentation of the first content item utilizing the first text-to-speech voice data;
  
  generating a second audio presentation of the second content item utilizing the second text-to-speech voice data; and
  
  assembling an audio program comprising the first audio presentation and the second audio presentation.
- View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
- - 7. The computer-implemented method of claim 6, further comprising:
    - determining that the second content item comprises a first portion and a second portion; and
      
      identifying third text-to-speech voice data based at least in part on the characteristic of the second content item,wherein generating the second audio presentation of the second content item comprises utilizing the second text-to-speech voice data with the first portion and the third text-to-speech voice data with the second portion.
  - 8. The computer-implemented method of claim 7, wherein the second content item comprises a quotation, wherein the first portion does not comprise the quotation, and wherein the second portion comprises the quotation.
  - 9. The computer-implemented method of claim 7, wherein the second content item comprises an interview, wherein the first portion corresponds to an interviewer, and wherein the second portion corresponds to an interviewee.
  - 10. The computer-implemented method of claim 6, wherein the audio program comprises streaming audio and wherein the streaming audio comprises the first audio presentation and the second audio presentation.
  - 11. The computer-implemented method of claim 6, wherein assembling the audio program comprises placing a segue between the first audio presentation and the second audio presentation.
  - 12. The computer-implemented method of claim 11, wherein the segue comprises at least a portion of a music recording, and wherein the portion is obtained from a client device or from a network-accessible music server.
  - 13. The computer-implemented method of claim 6, wherein assembling the audio program comprises:
    - determining a summary of the audio program;
      
      generating a third audio presentation of the summary; and
      
      including the third audio presentation in the audio program.
  - 14. The computer-implemented method of claim 6, further comprising:
    - receiving, from a client device, authentication information associated with the first content source,wherein retrieving the first content item comprises presenting the authentication information to the first content source.
  - 15. The computer-implemented method of claim 6, wherein the first characteristic comprises at least one of a subject matter, a vocabulary, a length, a source, or an author.
  - 16. The computer-implemented method of claim 6, further comprising:
    - identifying a speaker gender, a speaker age, or a speaker voice speed based at least in part on the characteristic of the first content item,wherein identifying the first text-to-speech voice data is further based at least in part on the speaker gender, speaker age, or speaker voice speed.
  - 17. The computer-implemented method of claim 6, wherein generating a first audio presentation of the first content item comprises:
    - summarizing the first content item, wherein the summarization is based on natural language understanding (NLU); and
      
      generating a first audio presentation of the summarization.
  - 18. The computer-implemented method of claim 6, further comprising:
    - receiving tag data from the client device, wherein the tag data indicates a content item to tag; and
      
      tagging the content item indicated by the tag data.

19. A non-transitory computer readable medium comprising executable code that, when executed by a processor, causes a server computing system comprising one or more computing devices to perform a process comprising:
- retrieving a first content item from a first content source and a second content item from a second content source, wherein the first content source is different from the second content source;
  
  identifying first text-to-speech voice data;
  
  generating a first audio presentation of the first content item utilizing the first text-to-speech voice data;
  
  generating a second audio presentation of the second content item; and
  
  assembling an audio program comprising the first audio presentation and the second audio presentation.
- View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
- - 20. The non-transitory computer readable medium of claim 19, wherein the second content item comprises audio content.
  - 21. The non-transitory computer readable medium of claim 19 wherein the first content item and the second content item are retrieved based at least in part on user selection data.
  - 22. The non-transitory computer readable medium of claim 19, wherein the first text-to-speech voice data is identified based on a characteristic of the first content item.
  - 23. The non-transitory computer readable medium of claim 22, wherein the characteristic comprises one of a subject matter, a vocabulary, a length, a source, or an author.
  - 24. The non-transitory computer readable medium of claim 19, wherein the first text-to-speech voice data is identified based on a voice characteristic determined from the first content item.
  - 25. The non-transitory computer readable medium of claim 24, wherein the voice characteristic comprises one of an age of a speaker, a gender of the speaker, or a speaking rate of the speaker.
  - 26. The non-transitory computer readable medium of claim 19, further comprising:
    - identifying second text-to-speech voice data and third text-to-speech voice data based at least in part on a characteristic of the second content item;
      
      in response to determining that the second text-to-speech voice data comprises the first text-to-speech voice data, generating the second audio presentation based at least in part on the third text-to-speech voice data; and
      
      in response to determining that the second text-to-speech voice data does not comprise the first text-to-speech voice data, generating the second audio presentation based at least in part on the second text-to-speech voice data.
  - 27. The non-transitory computer readable medium of claim 19, wherein assembling the audio program comprises placing a segue between the first audio presentation and the second audio presentation.
  - 28. The non-transitory computer readable medium of claim 27, wherein the segue comprises at least a portion of a music recording, and wherein the portion is obtained from the client device or from a network-accessible music server.
  - 29. The non-transitory computer readable medium of claim 19, wherein assembling the audio program comprises:
    - determining a summary of audio program;
      
      generating a third audio presentation of the summary; and
      
      including the third audio presentation in the audio program.
  - 30. The non-transitory computer readable medium of claim 19, further comprising:
    - receiving, from a client device, first authentication information associated with the first content source,wherein retrieving the first content item comprises presenting the authentication information to the first content source.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
IVONA Software Sp zoo (Amazon.com, Inc.)
Inventors
Kaszczuk, Michal T., Osowski, Lukasz M.

Granted Patent

US 9,190,049 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/260
CPC Class Codes

G10L 13/02   Methods for producing synth...

G10L 13/033   Voice editing, e.g. manipul...

G10L 13/08   Text analysis or generation...

GENERATING PERSONALIZED AUDIO PROGRAMS FROM TEXT CONTENT

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

30 Claims

Specification

Solutions

Use Cases

Quick Links

GENERATING PERSONALIZED AUDIO PROGRAMS FROM TEXT CONTENT

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

30 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links