Intelligent text-to-speech conversion

US 9,305,543 B2
Filed: 02/25/2015
Issued: 04/05/2016
Est. Priority Date: 04/05/2008
Status: Active Grant

First Claim

Patent Images

1. A method for converting text to speech, the method comprising:

at an electronic device with a processor and memory storing one or more programs for execution by the processor;

parsing a document to identify a plurality of text elements in the document to be converted to speech, wherein in the document, a first text element of the plurality of text elements is positioned before a second text element of the plurality of text elements;

determining, by the processor, an order in which the plurality of text elements are to be spoken, wherein the determined order comprises speaking the second text element before the first text element; and

converting the plurality of text elements to speech, wherein the speech is spoken in the determined order.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques for improved text-to-speech processing are disclosed. The improved text-to-speech processing can convert text from an electronic document into an audio output that includes speech associated with the text as well as audio contextual cues. One aspect provides audio contextual cues to the listener when outputting speech (spoken text) pertaining to a document. The audio contextual cues can be based on an analysis of a document prior to a text-to-speech conversion. Another aspect can produce an audio summary for a file. The audio summary for a document can thereafter be presented to a user so that the user can hear a summary of the document without having to process the document to produce its spoken text via text-to-speech conversion.

244 Citations

25 Claims

1. A method for converting text to speech, the method comprising:
- at an electronic device with a processor and memory storing one or more programs for execution by the processor;
  
  parsing a document to identify a plurality of text elements in the document to be converted to speech, wherein in the document, a first text element of the plurality of text elements is positioned before a second text element of the plurality of text elements;
  
  determining, by the processor, an order in which the plurality of text elements are to be spoken, wherein the determined order comprises speaking the second text element before the first text element; and
  
  converting the plurality of text elements to speech, wherein the speech is spoken in the determined order.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein the second text element includes at least a portion of a footnote of the document.
  - 3. The method of claim 1, wherein the second text element includes a page number of the document.
  - 4. The method of claim 1, wherein the first text element includes at least a portion of a table of contents of the document.
  - 5. The method of claim 1, further comprising:
    - generating a text-to-speech processing script after determining the order, wherein the text-to-speech processing script includes the first text element and the second text element, and wherein the text-to-speech processing script further includes an annotation disposed before the first text element to indicate that the second text element is to be spoken before the first text element.
  - 6. The method of claim 5, wherein the text-to-speech processing script further includes a second annotation at the second text element to indicate that the second text element is not to be re-spoken.
  - 7. The method of claim 1, wherein the document includes a plurality of embedded text-to-speech markup tags that were inserted during creation of the document, and wherein the plurality of text elements are identified based on the plurality of text-to-speech markup tags.

8. A method for converting text to speech, the method comprising:
- at an electronic device with a processor and memory storing one or more programs for execution by the processor;
  
  parsing a document to identify a subset of text to be converted to speech, the subset of text having a context;
  
  creating an announcement comprising a spoken description of the context;
  
  determining, by the processor, an order in which the announcement and a spoken form of the subset of text are to be spoken, wherein the determined order comprises speaking the announcement prior to the spoken form of the subset of text; and
  
  generating audio that includes the spoken form of the subset of text and the announcement, wherein the announcement is spoken prior to the spoken form of the subset of text.
- View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16)
- - 9. The method of claim 8, wherein the context is a footnote.
  - 10. The method of claim 8, wherein the context is a title.
  - 11. The method of claim 8, further comprising:
    - identifying a second subset of text while parsing the document, the second subset of text having a second context that is different from the context; and
      
      creating a second announcement comprising a spoken description of the second context;
      
      wherein the generated audio includes a spoken form of the second subset of text and the second announcement, wherein the second announcement is spoken prior to the spoken form of the second subset of text.
  - 12. The method of claim 8, wherein the document does not include text corresponding to the announcement.
  - 13. The method of claim 8, further comprising:
    - identifying a non-text element of the document while parsing the document; and
      
      creating an audio cue that represents the non-text element in the document, wherein the generated audio includes the audio cue.
  - 14. The method of claim 13, wherein the non-text element is an image.
  - 15. The method of claim 13, wherein the non-text element is a hyperlink.
  - 16. The method of claim 8, further comprising:
    - generating a text-to-speech processing script that includes the subset of text and the announcement, wherein the text-to-speech processing script is processed to generate the audio.

17. A non-transitory computer-readable storage medium comprising instructions for causing one or more processors to:
- parsing a document to identify a plurality of text elements in the document to be converted to speech, wherein in the document, a first text element of the plurality of text elements is positioned before a second text element of the plurality of text elements;
  
  determining, by the one or more processors, an order in which the plurality of text elements are to be spoken, wherein the determined order comprises speaking the second text element before the first text element; and
  
  converting the plurality of text elements to speech, wherein the speech is spoken in the determined order.

18. An electronic device comprising:
- one or more processors;
  
  memory;
  
  one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for;
  
  parsing a document to identify a plurality of text elements in the document to be converted to speech, wherein in the document, a first text element of the plurality of text elements is positioned before a second text element of the plurality of text elements;
  
  determining, by the one or more processors, an order in which the plurality of text elements are to be spoken, wherein the determined order comprises speaking the second text element before the first text element; and
  
  converting the plurality of text elements to speech, wherein the speech is spoken in the determined order.
- View Dependent Claims (19, 20, 21)
- - 19. The device of claim 18, wherein the second text element includes at least a portion of a footnote of the document.
  - 20. The device of claim 18, wherein the second text element includes a page number of the document.
  - 21. The device of claim 18, wherein the one or more programs further include instructions for:
    - generating a text-to-speech processing script after determining the order, wherein the text-to-speech processing script includes the first text element and the second text element, and wherein the text-to-speech processing script further includes an annotation disposed before the first text element to indicate that the second text element is to be spoken before the first text element.

22. An electronic device comprising:
- one or more processors;
  
  memory;
  
  one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for;
  
  parsing a document to identify a subset of text to be converted to speech, the subset of text having a context;
  
  creating an announcement comprising a spoken description of the context;
  
  determining, by the one or more processors, an order in which the announcement and a spoken form of the subset of text are to be spoken, wherein the determined order comprises speaking the announcement prior to the spoken form of the subset of text; and
  
  generating audio that includes the spoken form of the subset of text and the announcement, wherein the announcement is spoken prior to the spoken form of the subset of text.
- View Dependent Claims (23, 24, 25)
- - 23. The device of claim 22, wherein the context is a footnote.
  - 24. The device of claim 22, wherein the document does not include text corresponding to the announcement.
  - 25. The device of claim 22, wherein the one or more programs further include instructions for:
    - identifying a non-text element of the document while parsing the document; and
      
      creating an audio cue that represents the non-text element in the document, wherein the generated audio includes the audio cue.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Apple Inc.
Original Assignee
Apple Inc.
Inventors
Fleizach, Christopher Brian, Hudson, Reginald Dean
Primary Examiner(s)
COLUCCI, MICHAEL C

Application Number

US14/631,526
Publication Number

US 20150170635A1
Time in Patent Office

405 Days
Field of Search

381/56, 381/320, 715/239, 715/854, 715/763, 715/716, 715/200, 704/260, 704/9, 704/270.1, 704/243, 704/235, 704/208, 707/999.003, 709/206, 709/219, 700/94, 379/257, 365/232, 365/231, 713/219, 705/17, 434/317, 370/311, 345/173
US Class Current

1/1
CPC Class Codes

G06F 40/205   Parsing

G10L 13/00   Speech synthesis; Text to s...

G10L 13/027   Concept to speech synthesis...

G10L 13/08   Text analysis or generation...

G10L 19/018   Audio watermarking, i.e. em...

Intelligent text-to-speech conversion

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

244 Citations

25 Claims

Specification

Solutions

Use Cases

Quick Links

Intelligent text-to-speech conversion

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

244 Citations

25 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links