INTELLIGENT TEXT-TO-SPEECH CONVERSION

US 20170178620A1
Filed: 03/06/2017
Published: 06/22/2017
Est. Priority Date: 04/05/2008
Status: Active Grant

First Claim

Patent Images

1-33. -33. (canceled)

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques for improved text-to-speech processing are disclosed. The improved text-to-speech processing can convert text from an electronic document into an audio output that includes speech associated with the text as well as audio contextual cues. One aspect provides audio contextual cues to the listener when outputting speech (spoken text) pertaining to a document. The audio contextual cues can be based on an analysis of a document prior to a text-to-speech conversion. Another aspect can produce an audio summary for a file. The audio summary for a document can thereafter be presented to a user so that the user can hear a summary of the document without having to process the document to produce its spoken text via text-to-speech conversion.

Citations

53 Claims

1-33. -33. (canceled)

34. A method for converting text to speech, the method comprising:
- at an electronic device with a processor and memory storing one or more programs for execution by the processor;
  
  parsing a document to identify a plurality of elements in the document;
  
  associating a first element of the plurality of elements with a first markup tag and a second element of the plurality of elements with a second markup tag;
  
  creating an announcement comprising a spoken description of context for the first element; and
  
  based on the first markup tag and the second markup tag, generating audio that includes the announcement and a spoken form of text of the second element, wherein the announcement is spoken prior to the spoken form of the text of the second element.
- View Dependent Claims (35, 36, 37, 38, 39, 40, 41, 42, 43)
- - 35. The method of claim 34, wherein the context is a footnote.
  - 36. The method of claim 34, wherein the context is a title.
  - 37. The method of claim 34, wherein the document does not include text corresponding to the announcement.
  - 38. The method of claim 34, further comprising:
    - identifying a non-text element of the plurality of elements in the document while parsing the document; and
      
      creating an audio cue that represents the non-text element in the document, wherein the generated audio includes the audio cue.
  - 39. The method of claim 38, wherein the non-text element is an image.
  - 40. The method of claim 38, wherein the non-text element is a hyperlink.
  - 41. The method of claim 34, further comprising:
    - generating a text-to-speech processing script that includes the text of the second element and the announcement, wherein the text-to-speech processing script is processed to generate the audio.
  - 42. The method of claim 34, wherein parsing the document includes determining that the first element is a non-spoken element and the second element is a spoken element.
  - 43. The method of claim 34, wherein in the document, the first element is positioned after the second element.

44. A non-transitory computer-readable storage medium comprising instructions, which when executed by an electronic device, causes the electronic device to:
- parse a document to identify a plurality of elements in the document;
  
  associate a first element of the plurality of elements with a first markup tag and a second element of the plurality of elements with a second markup tag;
  
  create an announcement comprising a spoken description of context for the first element; and
  
  based on the first markup tag and the second markup tag, generate audio that includes the announcement and a spoken form of text of the second element, wherein the announcement is spoken prior to the spoken form of the text of the second element.
- View Dependent Claims (45, 46, 47, 48)
- - 45. The computer-readable storage medium of claim 44, wherein the document does not include text corresponding to the announcement.
  - 46. The computer-readable storage medium of claim 44, wherein the first element is a non-text element.
  - 47. The computer-readable storage medium of claim 44, wherein parsing the document includes determining that the first element is a non-spoken element and the second element is a spoken element.
  - 48. The computer-readable storage medium of claim 44, wherein in the document, the first element is positioned after the second element.

49. An electronic device, comprising:
- one or more processors; and
  
  memory storing one or more programs, the one or more programs including instructions, which when executed by the one or more processors, causes the one or more processors to;
  
  parse a document to identify a plurality of elements in the document;
  
  associate a first element of the plurality of elements with a first markup tag and a second element of the plurality of elements with a second markup tag;
  
  create an announcement comprising a spoken description of context for the first element; and
  
  based on the first markup tag and the second markup tag, generate audio that includes the announcement and a spoken form of text of the second element, wherein the announcement is spoken prior to the spoken form of the text of the second element.
- View Dependent Claims (50, 51, 52, 53)
- - 50. The device of claim 49, wherein the document does not include text corresponding to the announcement.
  - 51. The device of claim 49, wherein the first element is a non-text element.
  - 52. The device of claim 49, wherein parsing the document includes determining that the first element is a non-spoken element and the second element is a spoken element.
  - 53. The device of claim 49, wherein in the document, the first element is positioned after the second element.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Apple Inc.
Original Assignee
Apple Inc.
Inventors
FLEIZACH, Christopher Brian, HUDSON, Reginald Dean

Granted Patent

US 9,865,248 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G06F 40/205   Parsing

G10L 13/00   Speech synthesis; Text to s...

G10L 13/027   Concept to speech synthesis...

G10L 13/08   Text analysis or generation...

G10L 19/018   Audio watermarking, i.e. em...

INTELLIGENT TEXT-TO-SPEECH CONVERSION

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

53 Claims

Specification

Solutions

Use Cases

Quick Links

INTELLIGENT TEXT-TO-SPEECH CONVERSION

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

53 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links