Community audio narration generation

US 9,002,703 B1
Filed: 09/28/2011
Issued: 04/07/2015
Est. Priority Date: 09/28/2011
Status: Active Grant

First Claim

Patent Images

1. One or more computer readable media storing computer-executable instructions that, when executed, cause one or more processors to perform acts comprising:

selecting a text-based work that includes at least one content section without a corresponding audio reading;

presenting the text-based work to a plurality of human readers to solicit an audio reading of the at least one content section of the text-based work;

obtaining a group of audio recordings from the plurality of human readers, each audio recording having metadata that identifies a respective location within a corresponding content section of the text-based work;

combining the group of audio recordings in order using the respective location identified by the metadata of the audio recordings to produce an audio file that includes the audio reading for at least the content section of the text-based work; and

distributing an integrated product that includes a copy of the text-based work and a copy of the audio file to an electronic device.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The community-based generation of audio narrations for a text-based work leverages collaboration of a community of people to provide human-voiced audio readings. During the community-based generation, a collection of audio recordings for the text-based work may be collected from multiple human readers in a community. An audio recording for each section in the text-based work may be selected from the collection of audio recordings. The selected audio recordings may be then combined to produce an audio reading of at least a portion of the text-based work.

Citations

30 Claims

1. One or more computer readable media storing computer-executable instructions that, when executed, cause one or more processors to perform acts comprising:
- selecting a text-based work that includes at least one content section without a corresponding audio reading;
  
  presenting the text-based work to a plurality of human readers to solicit an audio reading of the at least one content section of the text-based work;
  
  obtaining a group of audio recordings from the plurality of human readers, each audio recording having metadata that identifies a respective location within a corresponding content section of the text-based work;
  
  combining the group of audio recordings in order using the respective location identified by the metadata of the audio recordings to produce an audio file that includes the audio reading for at least the content section of the text-based work; and
  
  distributing an integrated product that includes a copy of the text-based work and a copy of the audio file to an electronic device.
- View Dependent Claims (2, 3, 4)
- - 2. The one or more computer readable media of claim 1, wherein the obtaining includes storing an audio recording made by a human user when a threshold number of spoken words in the audio recording match text in a corresponding content section of the text-based work based at least in part on a speech-to-text analysis of at least a portion of the audio recording.
  - 3. The one or more computer readable media of claim 1, further comprising organizing a collaborative reading of the text-based work by the plurality of human readers, wherein each human reader is assigned to read one or more content sections of the text-based work.
  - 4. The one or more computer readable media of claim 1, the acts further comprising determining, based on the metadata, a supplementation status of the text-based work, wherein the supplementation status is indicative of at least one additional content section, in a remainder of the text-based work, without a corresponding audio reading.

5. A computer implemented method, comprising:
- receiving a group of audio recordings from a plurality of human readers for storage on a server, individual ones of the group of audio recordings including metadata that provides identification information and identifies a respective location within a corresponding section of a text-based work;
  
  identifying a set of audio recordings from the group of audio recordings as corresponding to the text-based work based at least on the metadata; and
  
  combining the set of audio recordings to produce an audio reading including at least one audio file for at least a portion of the text-based work by digitally splicing the set of audio recordings in an order based at least in part on the respective location identified by the metadata of the set of audio recordings.
- View Dependent Claims (6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
- - 6. The computer implemented method of claim 5, further comprising integrating the audio reading with the text-based work to produce an integrated product.
  - 7. The computer implemented method of claim 6, further comprising distributing the integrated product to one or more users.
  - 8. The computer implemented method of claim 5, further comprising organizing a collaborative reading of the text-based work by the plurality of human readers, wherein each human reader is assigned to read one or more content sections of the text-based work.
  - 9. The computer implemented method of claim 5, further comprising analyzing spoken words included in an audio recording to determine whether spoken words in the audio reading match at least a threshold amount of text in a corresponding content section of the text-based work based at least in part on a speech-to-text analysis of at least a portion of the audio recording, wherein the receiving includes storing the audio recording when the spoken words match at least the threshold amount of the text in the corresponding content section.
  - 10. The computer implemented method of claim 5, wherein the identifying further includes identifying an audio recording as corresponding to a content section of the text-based work based on metadata about the audio recording, the metadata being assigned by a server to the audio recording of the text-based work, provided by a human reader with a submission of the audio recording, or obtained from a speech-to-text association of the audio recording to a corresponding content section of the text-based work.
  - 11. The computer implemented method of claim 5, wherein multiple audio recordings in the group correspond to a content section of the text-based work, and further comprising selecting one of the multiple audio recordings for inclusion in the set of audio recordings based at least in part on user ratings of the multiple audio recordings.
  - 12. The computer implemented method of claim 11, wherein the selecting further includes selecting one of the multiple audio recordings based at least in part on user ratings of the multiple audio recordings and continuity of each audio recording in relation to other audio recordings in the set of audio recordings.
  - 13. The computer implemented method of claim 5, wherein multiple audio recordings include overlapping sections that correspond to a content section of the text-based work, further comprising discarding one of the overlapping sections prior to including the multiple audio recordings in the set of audio recordings.
  - 14. The computer implemented method of claim 13, wherein the discarding further includes discarding one of the overlapping sections based at least in part on at least one of user ratings of the multiple audio recordings or continuity of each audio recording in relation to other audio recordings in the set.
  - 15. The computer implemented method of claim 14, wherein the user ratings are numerical ratings according to a standardized rating scale.

16. A server, comprising:
- a processor; and
  
  memory storing components executable by the processor, the components comprising;
  
  a content presentation component that presents a text-based work that includes a content section without a corresponding audio reading to solicit an audio reading of the content section;
  
  an audio collection component to receive the audio reading of the content section from a human reader, the audio reading of the content section including metadata that identifies the audio reading as corresponding to the content section and that identifies a location within the content section; and
  
  an integration component to digitally splice the audio reading with an additional audio reading of another content section of the text-based work in response to determining, based at least on the metadata and additional metadata that is associated with the additional audio reading, that the audio reading and the additional reading are related,wherein an order in which the audio reading is digitally spliced with the additional audio reading is based at least in part on the location identified by the metadata.
- View Dependent Claims (17, 18, 19, 20, 21, 22)
- - 17. The server of claim 16, wherein the content presentation component selects the text-based work for presentation to a human reader based on a purchase history of the human reader.
  - 18. The server of claim 16, wherein the content presentation component selects the text-based work for presentation to a human reader based on a percentage of the text-based work having corresponding audio readings exceeding a predetermined threshold.
  - 19. The server of claim 16, where the content presentation component selects the text-based work for presentation to a human reader based on one or more of a demand for the audio reading of the text-based work and a user profile of the human reader.
  - 20. The server of claim 19, wherein the user profile includes information on one or more of a gender of the human reader, a genre of work preferred by the human reader, or voice characteristics of the human reader.
  - 21. The server of claim 19, wherein the user profile includes information on one or more of voice characteristics, including at least one of a tone, pitch, resonance, and vocal range of the human reader.
  - 22. The server of claim 16, wherein the content presentation component presents the text-based work with an indicator that indicates a percentage of the text-based work having corresponding audio readings.

23. One or more computer readable media storing computer-executable instructions that, when executed, cause one or more processors to perform acts comprising:
- receiving an audio reading for a content section of a text-based work from a human reader, the audio reading including metadata identifying a location within the content section;
  
  determining whether spoken words in the audio reading match at least a threshold amount of text in the content section based at least in part on a speech-to-text analysis of at least a portion of the audio reading;
  
  storing the audio reading in a data store when the spoken words at least match the threshold amount of the text in the content section;
  
  prompting the human reader to submit a repeat audio reading of at least a portion of the content section when the spoken words fail to match at least the threshold amount of the text in the content section; and
  
  combining the audio reading with at least one additional audio reading, wherein an order in which the audio reading is combined with the at least one additional audio reading is based at least in part on the location identified by the metadata.
- View Dependent Claims (24, 25, 26, 27, 28, 29, 30)
- - 24. The one or more computer readable media of claim 23, further comprising prompting the human reader to submit an additional audio reading for a subsequent content section of the text-based work when the spoken words match at least the threshold amount of the text in the content section.
  - 25. The one or more computer readable media of claim 23, further comprising discarding the audio reading when the spoken words fail to match at least the threshold amount of the text in the content section.
  - 26. The one or more computer readable media of claim 23, further comprising:
    - combining a plurality of audio recordings in the data store to produce an audio file; and
      
      integrating the audio file with the text-based work to produce an integrated product.
  - 27. The one or more computer readable media of claim 23, further comprising distributing the audio reading to an electronic device that presents the audio reading with a computer-generated reading of another content portion of the text-based work.
  - 28. The one or more computer readable media of claim 23, wherein the determining further includes determining whether a background noise level exceeds a maximum noise level, and wherein the storing includes storing the audio reading when the spoken words match at least the threshold amount of the text in the content section and the background noise level does not exceed the maximum noise level.
  - 29. The one or more computer readable media of claim 23, wherein the threshold amount of text is a minimal quantity of the spoken words or a predetermined minimal word match threshold between the spoken words in the audio reading and written words of the text in the content section.
  - 30. The one or more computer readable media of claim 23, further comprising:
    - determining whether the spoken words in the audio reading includes at least one added inappropriate word that is not present in the text of the content section; and
      
      discarding the audio reading when the spoken words includes the at least one added inappropriate word,wherein the storing includes storing the audio reading in the data store when the spoken words at least match the threshold amount of text in the content section and no added inappropriate word is included in the audio reading.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Crosley, Jay A.
Primary Examiner(s)
Dorvil, Richemond
Assistant Examiner(s)
Le, Thuykhanh

Application Number

US13/247,863
Time in Patent Office

1,287 Days
Field of Search

None
US Class Current

704/207
CPC Class Codes

G06Q 10/101 Collaborative creation, e.g...

G10L 15/26 Speech to text systems G10L...

Community audio narration generation

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

30 Claims

Specification

Solutions

Use Cases

Quick Links

Community audio narration generation

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

30 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links