Providing text to speech from digital content on an electronic device

US 8,990,087 B1
Filed: 09/30/2008
Issued: 03/24/2015
Est. Priority Date: 09/30/2008
Status: Active Grant

First Claim

Patent Images

1. A method for providing audio relating to digital content in an electronic device, comprising:

receiving digital content comprising a plurality of words and a supplemental pronunciation database of specified pronunciations for a portion of the plurality of words;

determining supplemental pronunciation instructions for a word of the plurality of words based at least in part on the supplemental pronunciation database;

determining default pronunciation instructions for another word of the plurality of words based at least in part on default pronunciation instructions in a default pronunciation database accessible by the electronic device;

determining that specified voice information used for synthesizing speech in a specified voice is specified for one or more of the plurality of words, wherein default voice information is used for synthesizing speech in a default voice in the absence of specified voice information; and

synthesizing speech for the plurality of words using the supplemental pronunciation instructions, the default pronunciation instructions, and at least one of the specified voice or the default voice.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for providing text to speech from digital content in an electronic device is described. Digital content including a plurality of words and a pronunciation database is received. Pronunciation instructions are determined for the word using the digital content. Audio or speech is played for the word using the pronunciation instructions. As a result, the method provides text to speech on the electronic device based on the digital content.

Citations

24 Claims

1. A method for providing audio relating to digital content in an electronic device, comprising:
- receiving digital content comprising a plurality of words and a supplemental pronunciation database of specified pronunciations for a portion of the plurality of words;
  
  determining supplemental pronunciation instructions for a word of the plurality of words based at least in part on the supplemental pronunciation database;
  
  determining default pronunciation instructions for another word of the plurality of words based at least in part on default pronunciation instructions in a default pronunciation database accessible by the electronic device;
  
  determining that specified voice information used for synthesizing speech in a specified voice is specified for one or more of the plurality of words, wherein default voice information is used for synthesizing speech in a default voice in the absence of specified voice information; and
  
  synthesizing speech for the plurality of words using the supplemental pronunciation instructions, the default pronunciation instructions, and at least one of the specified voice or the default voice.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein the specified voice information used to generate the specified voice is appended to the digital content and is included in the data structure with the digital content and the supplemental pronunciation database.
  - 3. The method of claim 2, wherein the specified voice information comprises parameters within hypertext markup language tags (HTML) in the digital content.
  - 4. The method of claim 1, further comprising determining that the specified voice is not specified for one or more of the plurality of words and synthesizing speech based at least in part on the default voice information.
  - 5. The method of claim 1, wherein the supplemental pronunciation database is used with the digital content received together with the supplemental pronunciation database and not with other digital content.
  - 6. The method of claim 1, wherein the default pronunciation database is stored in local memory of the electronic device.
  - 7. The method of claim 1, wherein the default voice information is stored in local memory of the electronic device.

8. An electronic device that is configured to provide audio relating to digital content, the electronic device comprising:
- a default pronunciation database; and
  
  instructions stored in memory, the instructions being executable to;
  
  receive digital content comprising a plurality of words and a supplemental pronunciation database that provides pronunciations for one or more of the plurality of words, wherein the supplemental pronunciation database is used with the digital content received in a same data structure as the supplemental pronunciation database and not with other digital content;
  
  for a first word for which the supplemental pronunciation database includes pronunciation instructions, synthesize a first speech for the first word based at least in part on the pronunciation instructions in the supplemental pronunciation database;
  
  for a second word for which the supplemental pronunciation database lacks pronunciation instructions, synthesize a second speech for the second word based at least in part on pronunciation instructions in the default pronunciation database;
  
  for a third word for which a specified voice is specified, synthesize a third speech for the third word based at least in part on the specified voice; and
  
  for a fourth word for which a specified voice is not specified, synthesize a fourth speech for the fourth word based at least in part on a default voice.
- View Dependent Claims (9, 10)
- - 9. The electronic device of claim 8, wherein the electronic device comprises an electronic book (eBook) reader device including wireless communication functionality.
  - 10. The electronic device of claim 8, wherein the digital content and the supplemental pronunciation database are included within a single data structure.

11. A server configured to enhance digital content, comprising:
- a database of digital content, wherein the digital content comprises a digital content item having a plurality of words;
  
  a default pronunciation database comprising default pronunciation instructions for synthesizing speech;
  
  specified voice information for synthesizing speech based at least in part on a specified voice;
  
  a supplemental pronunciation database comprising pronunciation instructions for synthesizing speech for one or more of the plurality of words, wherein the pronunciation instructions are different from the default pronunciation instructions; and
  
  a digital content enhancement module configured to generate enhanced digital content by appending the supplemental pronunciation database and the specified voice information to the digital content in a same data structure, such that sending of the enhanced digital content to a computing device causes the computing device to;
  
  synthesize a first speech based at least in part on the supplemental pronunciation database for a first one of the one or more of the plurality of words which have pronunciations in the supplemental pronunciation database;
  
  synthesize a second speech based at least in part on a default pronunciation database for a second one of the one or more of the plurality of words which do not have pronunciations in the supplemental pronunciation database;
  
  synthesize a third speech based at least in part on the specified voice for a third one of the one or more of the plurality of words which are specified to be synthesized with the specified voice; and
  
  synthesize a fourth speech based at least in part on a default voice for a fourth one of the one or more of the plurality of words for which a voice is not specified.
- View Dependent Claims (12)
- - 12. The server of claim 11, wherein the enhanced digital content comprises a single digital content data structure.

13. A non-transitory computer-readable medium comprising executable instructions for:
- receiving an electronic book comprising a plurality of words, a supplemental pronunciation database, and a specified voice;
  
  for a first word in the plurality of words that has pronunciation instructions included in the supplemental pronunciation database, synthesizing a first speech for the first word based at least in part on the pronunciation instructions from the supplemental pronunciation database;
  
  for a second word in the plurality of words that does not have pronunciation instructions included in the supplemental pronunciation database, synthesizing a second speech for the second word based at least in part on a default pronunciation database;
  
  for a third word in the plurality of words that is specified to be synthesized with the specified voice, synthesizing a third speech for the third word based at least in part on the specified voice; and
  
  for a fourth word in the plurality of words that is not specified to be synthesized with the specified voice, synthesizing a fourth speech for the fourth word based at least in part on a default voice.
- View Dependent Claims (14, 15, 16)
- - 14. The non-transitory computer-readable medium of claim 13, wherein the supplemental pronunciation database, the specified voice, and the eBook are included in a single digital content data structure.
  - 15. The non-transitory computer-readable medium of claim 13, wherein the executable instructions further comprise instructions for:
    - limiting use of the supplemental pronunciation database to the eBook to which the supplemental pronunciation database is appended.
  - 16. The non-transitory computer-readable medium of claim 13, wherein the supplemental pronunciation database and the specified voice are appended to the eBook.

17. A method for obtaining and rendering audio based on text in an electronic book (eBook), the method comprising:
- sending, from an eBook reader device, a request to download the eBook;
  
  receiving, at the eBook reader device, the eBook, a supplemental pronunciation database, and specified voice information for synthesizing speech in a specified voice;
  
  synthesizing a first speech for a first portion of text in the eBook based at least in part on a pronunciation from the supplemental pronunciation database for portions of text which have pronunciations in the supplemental pronunciation database;
  
  synthesizing a second speech for a second portion of text in the eBook based at least in part on a pronunciation from a default pronunciation database for portions of text which do not have pronunciations in the supplemental pronunciation database;
  
  synthesizing a third speech for a third portion of text in the eBook based at least in part on the specified voice for portions of text which are specified to be synthesized with the specified voice; and
  
  synthesizing a fourth speech for a fourth portion of text based at least in part on a default voice for portions of text which do not have any specified voice.
- View Dependent Claims (18, 19, 20, 21, 22, 23, 24)
- - 18. The method of claim 17, wherein the supplemental pronunciation database is restricted to be used with the eBook and not with at least one other eBook.
  - 19. The method of claim 17, wherein the supplemental pronunciation database is exclusive to at least one of the eBook, a category of eBooks to which the eBook belongs to, or a publisher associated with the eBook.
  - 20. The method of claim 17, wherein the supplemental pronunciation database is appended to the eBook in a same data structure.
  - 21. The method of claim 17, wherein the default pronunciation database is stored on the eBook reader device.
  - 22. The method of claim 20, wherein the supplemental pronunciation database is used by the eBook received in the same data structure as the supplemental pronunciation database and not with other eBooks.
  - 23. The method of claim 17, wherein the supplemental pronunciation database is generated based at least in part on content of the eBook.
  - 24. The method of claim 17, further comprising storing the eBook, the supplemental pronunciation database, and the specified voice information on the eBook reader device.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Lattyak, John, Kim, John T., Chu, Robert Wai-Chi, Nguyen, Laurent An Minh
Primary Examiner(s)
COLUCCI, MICHAEL C

Application Number

US12/242,394
Time in Patent Office

2,366 Days
Field of Search

704/270, 704/272, 704258-260, 704/201, 704/266, 704/3, 706/11, 715/201, 715/203
US Class Current

704/251
CPC Class Codes

G10L 13/08 Text analysis or generation...

Providing text to speech from digital content on an electronic device

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

24 Claims

Specification

Solutions

Use Cases

Quick Links

Providing text to speech from digital content on an electronic device

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

24 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links