Audio synchronization for document narration with user-selected playback

US 9,478,219 B2
Filed: 12/01/2014
Issued: 10/25/2016
Est. Priority Date: 05/18/2010
Status: Expired due to Fees

First Claim

Patent Images

1. A method comprises:

applying speech recognition by a hand-held device that includes a processor device and memory to an audio recording to generate a text file including as text in the file recognized speech from the audio recording and determine an elapsed time period from a reference time in the audio recording to text in the file of recognized speech;

comparing by the hand-held device text of recognized speech to expected text;

generating by the hand-held device a timing file that is stored on a computer-readable storage medium, the timing file comprising the elapsed time information for the expected text;

rendering on a display device associated with the hand-held device, text;

rendering on the display device, a menu that displays graphics of multiple characters each of which is associated with a different audio recording;

receiving by the hand-held device an indication of user-selected text, the user-selected text corresponding to portions of the text that are rendered aloud from corresponding portions of the audio recording for a first user selected character;

determining by the hand-held device an elapsed time in the audio recording by referencing the timing file associated with the user-selected text; and

providing by the mobile device an audible output corresponding to the audio in the audio recording at the determined elapsed time in the audio recording for the user selected portions of text for narration with the corresponding portions of the audio recording.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Disclosed are techniques and systems to provide a narration of a text. In some aspects, the techniques and systems described herein include generating a timing file that includes elapsed time information for expected portions of text that provides an elapsed time period from a reference time in an audio recording to each portion of text in recognized portions of text.

147 Citations

22 Claims

1. A method comprises:
- applying speech recognition by a hand-held device that includes a processor device and memory to an audio recording to generate a text file including as text in the file recognized speech from the audio recording and determine an elapsed time period from a reference time in the audio recording to text in the file of recognized speech;
  
  comparing by the hand-held device text of recognized speech to expected text;
  
  generating by the hand-held device a timing file that is stored on a computer-readable storage medium, the timing file comprising the elapsed time information for the expected text;
  
  rendering on a display device associated with the hand-held device, text;
  
  rendering on the display device, a menu that displays graphics of multiple characters each of which is associated with a different audio recording;
  
  receiving by the hand-held device an indication of user-selected text, the user-selected text corresponding to portions of the text that are rendered aloud from corresponding portions of the audio recording for a first user selected character;
  
  determining by the hand-held device an elapsed time in the audio recording by referencing the timing file associated with the user-selected text; and
  
  providing by the mobile device an audible output corresponding to the audio in the audio recording at the determined elapsed time in the audio recording for the user selected portions of text for narration with the corresponding portions of the audio recording.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1 wherein the recognized or expected text are words.
  - 3. The method of claim 2 wherein generating the timing file comprises:
    - storing the elapsed time information for a recognized word in the timing file if the recognized word matches the corresponding expected word; and
      
      computing elapsed time information for the expected word; and
      
      storing the computed elapsed time information into the timing file if the recognized word does not match the corresponding expected word.
  - 4. The method of claim 2 further comprises:
    - receiving by the hand-held device a second indication of user-selected text that correspond to text that are rendered aloud from corresponding portions of a second, different audio recording for a second user selected character; and
      
      providing by the hand-held device an audible output corresponding to the audio in the second audio recording for the second user selected portions of text for narration with the corresponding portions of the second audio recording.
  - 5. The method of claim 2 wherein computing elapsed time further comprises:
    - determining the elapsed time for an expected word based on a metric associated with an expected length of time to speak the expected portion of text.
  - 6. The method of claim 1 further comprises:
    - providing visual indicia for the displayed text that corresponds to one or more words which do not match the recognized one or more words, if the recognized one or more words do not match the corresponding expected one or more words.
  - 7. The method of claim 1 wherein when a user selects text, the one or more computer systems begin playback at the first word in the user-selected text and reads continuously from that point.
  - 8. The method of claim 7 wherein the one or more computer systems stop playback according to at least one of when the user inputs a command to the system to stop, when the system reaches the end of the document, when the system reaches the end of the user selected portion of the document, and when the system reaches a preset configuration selected from playback of a single paragraph, a single syllable, word, sentence, page, or other part of speech or reading unit.
  - 9. The method of claim 1 wherein the display device allows the user to indicate a point in the text through one or more of a cursor a stylus and finger on a touch screen.

10. A system comprises:
- one or more processor devices;
  
  memory coupled to the one or more processor devices; and
  
  a computer readable hardware storage device storing a computer program product comprising instructions for causing the one or more processors to;
  
  determine an elapsed time period from a reference time in an audio recording to text in a file of recognized speech from the audio recording;
  
  compare text of recognized speech to expected text;
  
  generate a timing file comprising the elapsed time information for the expected text;
  
  render on a display device associated with the system, text;
  
  render on the display device, a menu that displays graphics of multiple characters each of which is associated with a different audio recording;
  
  receive an indication of user-selected text, the user-selected text corresponding to portions of the text that are rendered aloud from corresponding portions of the audio recording for a first user selected character;
  
  determine an elapsed time in the audio recording by referencing the timing file associated with the user-selected text; and
  
  provide an audible output corresponding the audio in the audio recording at the determined elapsed time in the audio recording.
- View Dependent Claims (11, 12, 13, 14, 15, 16)
- - 11. The system of claim 10 wherein the recognized or expected text are words.
  - 12. The system of claim 11 further comprising instructions to:
    - apply speech recognition to an audio recording to generate a text file including as text in the file recognized speech from the audio recording.
  - 13. The system of claim 11 wherein instructions to generate the timing file comprises instructions to:
    - store the elapsed time information for a recognized word in the timing file if the recognized word matches the corresponding expected word; and
      
      compute elapsed time information for an expected word and storing the computed elapsed time information into the timing file if the recognized word does not match the corresponding expected word.
  - 14. The system of claim 10 further comprises instructions to:
    - receive a second indication of user-selected text that correspond to text that are rendered aloud from corresponding portions of a second, different audio recording for a second user selected character; and
      
      provide an audible output corresponding to the audio in the second audio recording for the second user selected portions of text for narration with the corresponding portions of the second audio recording.
  - 15. The system of claim 10 wherein when a user selects text, the system begins playback at the first word in the user-selected text and reads continuously from that point.
  - 16. The system of claim 15 wherein the system stops playback according to at least one of when the user inputs a command to the system to stop, when the system reaches the end of the document, when the system reaches the end of the user selected portion of the document, and when the system reaches a preset configuration selected from playback of a single paragraph, a single syllable, word, sentence, page, or other part of speech or reading unit.

17. A computer program product tangibly stored on a computer readable hardware storage device, the computer program product comprising instructions for causing a processor to:
- determine an elapsed time period from a reference time in an audio recording to text in a file of recognized speech;
  
  compare text of recognized speech to expected text;
  
  generate a timing file comprising the elapsed time information for the expected text;
  
  render on a display device, text;
  
  render on the display device, a menu that displays graphics of multiple characters each of which is associated with a different audio recording;
  
  receive an indication of user-selected text, the user-selected text corresponding to portions of the text that are rendered aloud from corresponding portions of the audio recording for a first user selected character;
  
  determine an elapsed time in the audio recording by referencing the timing file associated with the user-selected text; and
  
  provide an audible output corresponding the audio in the audio recording at the determined elapsed time in the audio recording.
- View Dependent Claims (18, 19, 20, 21, 22)
- - 18. The product of claim 17 wherein the recognized or expected text are words.
  - 19. The product of claim 18 wherein instructions to generate the timing file comprises instructions to:
    - store the elapsed time information for a recognized word in the timing file if the recognized word matches the corresponding expected word; and
      
      compute elapsed time information for an expected word and storing the computed elapsed time information into the timing file if the recognized word does not match the corresponding expected word.
  - 20. The product of claim 17 further comprising instructions to:
    - apply speech recognition to an audio recording to generate a text file including as text in the file recognized speech from the audio recording.
  - 21. The product of claim 17, further comprises instructions to:
    - receive a second indication of user-selected text that correspond to text that are rendered aloud from corresponding portions of a second, different audio recording for a second user selected character; and
      
      provide an audible output corresponding to the audio in the second audio recording for the second user selected portions of text for narration with the corresponding portions of the second audio recording.
  - 22. The product of claim 18 wherein instructions to compute elapsed time further comprise instructions to:
    - determine the elapsed time for an expected portion of text based on a metric associated with an expected length of time to speak the expected word.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
T Play Holdings LLC (SoftBank Group Corp.)
Original Assignee
K-NFB Reading Technology Inc.
Inventors
Kurzweil, Raymond C., Albrecht, Paul, Chapman, Peter, Gibson, Lucy
Primary Examiner(s)
PULLIAS, JESSE SCOTT

Application Number

US14/556,420
Publication Number

US 20150088505A1
Time in Patent Office

694 Days
Field of Search

704231-257, 704270-275
US Class Current

1/1
CPC Class Codes

G09B 5/06   with both visual and audibl...

G09B 5/062   Combinations of audio and p...

G10L 15/26   Speech to text systems G10L...

Audio synchronization for document narration with user-selected playback

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

147 Citations

22 Claims

Specification

Solutions

Use Cases

Quick Links

Audio synchronization for document narration with user-selected playback

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

147 Citations

22 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links