MANAGING PLAYBACK OF SYNCHRONIZED CONTENT
First Claim
1. A system for synchronizing output of audio content with textual content, the system comprising:
- a data store that stores textual content;
an input device that obtains a location of a pointer referencing the textual content;
a display device that presents the textual content;
an output device that outputs audio content; and
a processor in communication with the data store, the input device and the output device, the processor operative to;
generate audio content based at least in part on the textual content;
cause output of the generated audio content via the output device;
cause presentation of the textual content on the display device;
maintain synchronization of a position of the textual content presented on the display device with an associated position of the generated audio content output via the output device, wherein the associated position advances during output of the generated audio content;
obtain a current location of a pointer referencing the textual content presented on the display device from the input device;
determine a segment of textual content based at least in part on a difference between the current location of the pointer referencing the textual content and the position of the textual content;
determine a length of time required to output audio content corresponding to the determined segment of textual content; and
modify a speed at which the generated audio content is output via the output device based at least in part on the determined length of time.
1 Assignment
0 Petitions
Accused Products
Abstract
A computing device may provide a control interface that enables the user to manage the synchronized output of companion content (e.g., textual content and corresponding audio content generated by a text-to-speech component). For example, the computing device may display a visual cue to identify a current location in textual content corresponding to a current output position of companion audio content. As the audio content is presented, the visual cue may be advanced to maintain synchronization between the output position within the audio content and a corresponding position in the textual content. The user may control the synchronized output by dragging her finger across the textual content displayed on the touch screen. Accordingly, the control interface may provide a highlight or other visual indication of the distance between the advancing position in the textual content and the location of a pointer to the textual content indicated by the current position of the user'"'"'s finger.
327 Citations
39 Claims
-
1. A system for synchronizing output of audio content with textual content, the system comprising:
-
a data store that stores textual content; an input device that obtains a location of a pointer referencing the textual content; a display device that presents the textual content; an output device that outputs audio content; and a processor in communication with the data store, the input device and the output device, the processor operative to; generate audio content based at least in part on the textual content; cause output of the generated audio content via the output device; cause presentation of the textual content on the display device; maintain synchronization of a position of the textual content presented on the display device with an associated position of the generated audio content output via the output device, wherein the associated position advances during output of the generated audio content; obtain a current location of a pointer referencing the textual content presented on the display device from the input device; determine a segment of textual content based at least in part on a difference between the current location of the pointer referencing the textual content and the position of the textual content; determine a length of time required to output audio content corresponding to the determined segment of textual content; and modify a speed at which the generated audio content is output via the output device based at least in part on the determined length of time.
-
-
2. The system of claim 1, wherein modifying the speed at which the generated audio content is output includes at least one of increasing and decreasing the speed at which the generated audio content is output.
-
3. The system of claim 1, wherein obtaining the current location of the pointer referencing the textual content comprises obtaining a location of the pointer corresponding to at least one of a word, syllable, letter, sentence, line, paragraph, chapter, stanza, section, and column in the textual content.
-
4. The system of claim 1, wherein the input device comprises at least one of a touchscreen, a mouse, a stylus, a remote control, a video game controller, and a motion detector.
-
5. The system of claim 1, wherein modifying the speed at which the generated audio content is output includes modifying the speed if a difference between the current location of the pointer referencing the textual content and the position of the textual content satisfies a threshold value.
-
6. The system of claim 1, wherein determining the length of time required to output audio content corresponding to the determined segment of textual content includes generating audio content corresponding to the determined segment of textual content.
-
7. The system of claim 1, wherein determining the length of time required to output audio content corresponding to the determined segment of textual content includes determining a length of time required to output audio content corresponding to the determined segment of textual content without generating audio content corresponding to the determined segment of textual content.
-
8. The system of claim 7, wherein determining the length of time required to output audio content corresponding to the determined segment of textual content without generating audio content corresponding to the determined segment of textual content includes estimating a length of time required to output audio content based at least in part on at least one of a number of words in the segment of textual content, a number of syllables in the segment of textual content, a number of phonemes in the segment of textual content;
- a number of letters in the segment of textual content, a number of spaces in the segment of textual content, and a length of words in the segment of textual content.
-
9. A computer-implemented method comprising:
under control of one or more computing devices, generating audio content based at least in part on textual content; causing output of the generated audio content; causing presentation of the textual content; maintaining synchronization of a position of the textual content being presented with an associated position of the generated audio content being output, wherein the associated position of the generated audio content advances during output of the generated audio content; obtaining a current location of a pointer referencing the textual content being presented from an input device; determining a segment of textual content based at least in part on a difference between the current location of the pointer referencing the textual content being presented and the position of the textual content being presented; determining a length of time required to output audio content corresponding to the determined segment of textual content; and modifying an attribute associated with the output of the generated audio content based at least in part on the determined length of time.
-
10. The computer-implemented method of claim 9, wherein modifying an attribute associated with the output of the generated audio content comprises increasing or decreasing the speed at which the generated audio content is output.
-
11. The computer-implemented method of claim 9, wherein obtaining the current location of the pointer referencing the textual content comprises obtaining a location of the pointer corresponding to at least one of a natural feature and a predefined feature of the textual content.
-
12. The computer-implemented method of claim 9, wherein the input device comprises at least one of a touchscreen, a mouse, a stylus, a remote control, a video game controller, and a motion detector.
-
13. The computer-implemented method of claim 9, wherein modifying an attribute associated with the output of the generated audio content comprises modifying the attribute if the length of time required to output audio content corresponding to the determined segment of textual content satisfies a threshold value.
-
14. The computer-implemented method of claim 9, wherein determining the length of time required to output audio content corresponding to the determined segment of textual content comprises generating audio content corresponding to the determined segment of textual content.
-
15. The computer-implemented method of claim 9, wherein determining the length of time required to output audio content corresponding to the determined segment of textual content comprises determining a length of time required to output audio content without generating audio content corresponding to the determined segment of textual content.
-
16. The computer-implemented method of claim 15, wherein determining the length of time required to output audio content corresponding to the determined segment of textual content without generating audio content corresponding to the determined segment of textual content comprises estimating a length of time required to output audio content based at least in part on at least one of a number of words in the segment of textual content, a number of syllables in the segment of textual content, a number of phonemes in the segment of textual content;
- a number of letters in the segment of textual content, a number of spaces in the segment of textual content, and a length of words in the segment of textual content.
-
17. The computer-implemented method of claim 9 further comprising:
-
generating synchronization information based at least in part on the synchronization of the position of the textual content with the advancing associated position of the generated audio content; and storing the synchronization information.
-
-
18. The computer-implemented method of claim 17, wherein determining the length of time required to output audio content corresponding to the determined segment of textual content includes determining a length of time based at least in part on the stored synchronization information.
-
19. A system outputting audio content and displaying textual content, the system comprising:
-
a data store; and a processor in communication with the data store, the processor operative to; generate audio content based at least in part on textual content; cause output of the generated audio content; cause presentation of the textual content; obtain a current position in the textual content being presented associated with a position of the generated audio content being output; obtain a current location of a pointer referencing the textual content being presented; determine a segment of textual content based at least in part on a difference between the current location of the pointer referencing the textual content being presented and the current position of the textual content being presented; determine a length of the segment of textual content; and modify an attribute associated with the output of the generated audio content based at least in part on the determined length of the segment of textual content.
-
-
20. The system of claim 19, wherein the current position of the textual content corresponds to a position of the textual content associated with a position in the generated audio content that advances during output of the generated audio content.
-
21. The system of claim 19, wherein modifying an attribute associated with the output of the generated audio content includes increasing or decreasing the speed at which the generated audio content is output.
-
22. The system of claim 19, wherein obtaining the current location of the pointer referencing the textual content comprises obtaining the location of the pointer from an input device.
-
23. The system of claim 19, wherein modifying an attribute associated with the output of the generated audio content based at least in part on the determined length of the segment of textual content includes modifying the attribute if the determined length satisfies a threshold value.
-
24. The system of claim 23, wherein the threshold value is indicated by at least one of a visual cue, an auditory cue, and a tactile cue.
-
25. The system of claim 19, wherein determining the length of the segment of textual content comprises determining a length based at least in part on at least one of a number of words in the segment of textual content, a number of syllables in the segment of textual content, a number of phonemes in the segment of textual content;
- a number of letters in the segment of textual content, and a number of spaces in the segment of textual content.
-
26. The system of claim 19, wherein determining the length of the segment of textual content comprises determining a length of time required to output audio content corresponding to the segment of textual content.
-
27. The system of claim 26, wherein determining the length of time required to output audio content corresponding to the segment of the textual content comprises generating audio content corresponding to the segment of textual content.
-
28. The system of claim 26, wherein determining the length of time required to output audio content corresponding to the segment of textual content comprises determining a length of time required to output audio content corresponding to the segment of textual content without generating audio content corresponding to the segment of textual content.
-
29. The system of claim 28, wherein determining the length of time required to output audio content corresponding to the segment of textual content without generating audio content corresponding to the segment of textual content comprises estimating a length of time required to output audio content based at least in part on at least one of a number of words in the segment of textual content, a number of syllables in the segment of textual content, a number of phonemes in the segment of textual content;
- a number of letters in the segment of textual content, a number of spaces in the segment of textual content, and a length of words in the segment of textual content.
-
30. A computer-readable, non-transitory storage medium having at least one computer-executable component for providing synchronized content, the at least one computer-executable component comprising:
a content synchronization module operative to; generate audio content based at least in part on textual content; cause output of the generated audio content; cause presentation of the textual content; maintain synchronization of a position of the textual content being presented with an associated position of the generated audio content being output, wherein the associated position advances during output of the generated audio content; obtain a current location of a pointer referencing the textual content being presented from an input device; determine a segment of textual content based at least in part on a difference between the current location of the pointer referencing the textual content being presented and the current position of the textual content being presented; determine a length of time required to output audio content corresponding to the determined segment of textual content; and modify an attribute associated with the output of the generated audio content based at least in part on the determined length of time.
-
31. The computer-readable, non-transitory storage medium of claim 30, wherein modifying an attribute associated with the output of the generated audio content comprises increasing or decreasing the speed at which the audio content is output.
-
32. The computer-readable, non-transitory storage medium of claim 30, modifying an attribute associated with the output of the generated audio content comprises modifying the attribute if the length of time required to output audio content corresponding to the determined segment of textual content satisfies a threshold value.
-
33. The computer-readable, non-transitory storage medium of claim 30, wherein determining the length of time required to output audio content corresponding to the determined segment of textual content comprises generating audio content corresponding to the determined segment of textual content.
-
34. The computer-readable, non-transitory storage medium of claim 30, wherein determining the length of time required to output audio content corresponding to the determined segment of textual content comprises determining a length of time required to output audio content without generating audio content corresponding to the determined segment of textual content.
-
35. A computer-implemented method comprising:
under control of one or more computing devices, generating a first content based at least in part on a second content; causing output of the generated first content and the second content; maintaining synchronization of a position of the second content being output with an associated position of the generated first content being output, wherein the associated position of the generated first content advances during output of the generated first content; obtaining a current location of a pointer referencing the second content being output from an input device; determining a segment of the second content based at least in part on a difference between the current location of the pointer referencing the second content being output and the position of the second content being output; determining a length of time required to output a segment of the first content corresponding to the determined segment of the second content; and modifying an attribute associated with the output of the generated first content based at least in part on the determined length of time.
-
36. The computer-implemented method of claim 35, wherein modifying an attribute associated with the output of the generated first content comprises increasing or decreasing the speed at which the generated first content is output.
-
37. The computer-implemented method of claim 35, wherein the first content comprises at least one of audio content and visual content.
-
38. The computer-implemented method of claim 35, wherein the second content comprises at least one of a text, a music score, a picture or sequence of pictures, a diagram, a chart, or a presentation.
-
39. The computer-implemented method of claim 35, wherein modifying an attribute associated with the output of the generated first content comprises modifying the attribute if the length of time required to output a segment of the first content corresponding to the determined segment of the second content satisfies a threshold value.
Specification