Audio synchronization for document narration with user-selected playback

US 8,903,723 B2
Filed: 03/04/2013
Issued: 12/02/2014
Est. Priority Date: 05/18/2010
Status: Expired due to Fees

First Claim

Patent Images

1. A computer implemented method comprising:

applying speech recognition by one or more computer systems to an audio recording to generate a text version of recognized portions of text;

providing an audible output corresponding to the audio recording;

displaying, on a user interface rendered on a display device, an expected portion of text that corresponds to the words in the audio recording, the displayed expected portion of text including at least a portion of the expected portion of text that is currently being provided on the audible output;

providing visual indicia for the displayed text that corresponds to;

the audio that is currently being provided on the audible output, if the recognized portion of text matches the corresponding expected portion of text; and

otherwise one or more portions of text which does not match the recognized portion of text, if the recognized portion of text does not match the corresponding expected portion of text.

View all claims

8 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Disclosed are techniques and systems to provide a narration of a text. In some aspects, the techniques and systems described herein include generating a timing file that includes elapsed time information for expected portions of text that provides an elapsed time period from a reference time in an audio recording to each portion of text in recognized portions of text.

Citations

20 Claims

1. A computer implemented method comprising:
- applying speech recognition by one or more computer systems to an audio recording to generate a text version of recognized portions of text;
  
  providing an audible output corresponding to the audio recording;
  
  displaying, on a user interface rendered on a display device, an expected portion of text that corresponds to the words in the audio recording, the displayed expected portion of text including at least a portion of the expected portion of text that is currently being provided on the audible output;
  
  providing visual indicia for the displayed text that corresponds to;
  
  the audio that is currently being provided on the audible output, if the recognized portion of text matches the corresponding expected portion of text; and
  
  otherwise one or more portions of text which does not match the recognized portion of text, if the recognized portion of text does not match the corresponding expected portion of text.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The computer implemented method of claim 1 wherein the visual indicia is highlighting applied on the portion of expected text that is computed to be currently being spoken on the audio output.
  - 3. The computer implemented method of claim 1 wherein on the user interface when expected text is different from recognized text, a comparison result is classified into one of three types;
    - a number of linguistic units is the same between recognized and expected portions, more expected text in the mismatched portion than recognized text, and more recognized text than expected text.
  - 4. The computer implemented method of claim 3 wherein if the number of linguistic units is the same, the method further comprises:
    - generating timing information from an associated recognized word for an expected word.
  - 5. The computer implemented method of claim 3 wherein if there is more expected text in the mismatched portion than recognized text, the method visual indicia is not provided for extra expected text.
  - 6. The computer implemented method of claim 3 wherein if there is more expected text in the mismatched portion than recognized text, the method generates audio output with text to speech instead of playing the audio recording.
  - 7. The computer implemented method of claim 3 wherein if there is more recognized text than expected text, the method further comprises:
    - maintaining highlighting active on a previous word until it is time to speak the next expected word.
  - 8. The computer implemented method of claim 3 wherein if there is more recognized text than expected text, the method further comprises:
    - turning off all visual indicia while the audible output is providing the extra recognized text.
  - 9. The computer implemented method of claim 3 wherein if the recognized and expected text match the method uses the timing information to make the visual indicia for the displayed text correspond to the text that is currently audible.

10. A computer implemented method comprising:
- applying speech recognition by one or more computer systems to an audio recording to generate a text version of recognized portions of text;
  
  comparing by the one or more computer systems the recognized portion of text to an expected portion of text;
  
  providing an audible output corresponding to the audio recording;
  
  determining by the one or more computer systems a recognized portion of text corresponding to a currently audible portion of the audio recording;
  
  displaying an expected portion of text on a user interface rendered on a display device such that the displayed expected portion of text includes at least an expected portion of text previous to the determined currently audible portion of the audio recording; and
  
  providing visual indicia for the displayed expected portion of text according to whether there is a match between expected and recognized text.
- View Dependent Claims (11, 12)
- - 11. The computer implemented method of claim 10 wherein on the user interface when expected text is different from recognized text, a comparison result is classified into one of three types;
    - a number of linguistic units is the same between recognized and expected portions, more expected text in the mismatched portion than recognized text, and more recognized text than expected text.
  - 12. The computer implemented method of claim 11 wherein if the number of linguistic units is the same the instructions generate timing information from an associated recognized word for an expected word;
    - if there is more expected text in the mismatched portion than recognized text, the instructions do not provide visual indicia for extra expected text or the instructions generate audio output with text to speech instead of playing the audio recording;
      
      if there is more recognized text than expected text, the instructions maintain highlighting active on a previous word until it is time to speak the next expected word or turn off all visual indicia while the audio output is providing the extra recognized text.

13. A computer program product tangibly stored on a computer readable hardware storage device, the computer program product comprising instructions to cause a processor to:
- apply speech recognition by one or more computer systems to an audio recording to generate a text version of recognized portions of text;
  
  provide an audible output corresponding to the audio recording;
  
  display, on a user interface rendered on a display device, an expected portion of text that corresponds to the words in the audio recording, the displayed expected portion of text including at least a portion of the expected portion of text that is currently being provided on the audible output;
  
  provide visual indicia for the displayed text that corresponds to;
  
  the audio that is currently being provided on the audible output, if the recognized portion of text matches the corresponding expected portion of text; and
  
  otherwiseone or more portions of text which does not match the recognized portion of text, if the recognized portion of text does not match the corresponding expected portion of text.
- View Dependent Claims (14, 15, 16)
- - 14. The computer program product of claim 13 wherein on the user interface when expected text is different from recognized text, a comparison result is classified into one of three types;
    - a number of linguistic units is the same between recognized and expected portions, more expected text in the mismatched portion than recognized text, and more recognized text than expected text.
  - 15. The computer program product of claim 13 wherein if the number of linguistic units is the same the instructions generate timing information from an associated recognized word for an expected word;
    - if there is more expected text in the mismatched portion than recognized text, the instructions do not provide visual indicia for extra expected text or the instructions generate audio output with text to speech instead of playing the audio recording;
      
      if there is more recognized text than expected text, the instructions maintain highlighting active on a previous word until it is time to speak the next expected word or turn off all visual indicia while the audible output is providing the extra recognized text.
  - 16. The computer program product of claim 13 wherein if the recognized and expected text match, the method uses timing information to make the visual indicia for the displayed text correspond to the text that is currently audible.

17. A device comprises:
- a processor;
  
  a display in communication with the processor;
  
  a memory in communication with the processor; and
  
  a computer readable hardware storage device storing a computer program product to configure the processor to;
  
  apply speech recognition by one or more computer systems to an audio recording to generate a text version of recognized portions of text;
  
  provide an audible output corresponding to the audio recording;
  
  display, on a user interface rendered on a display device, an expected portion of text that corresponds to the words in the audio recording, the displayed expected portion of text including at least a portion of the expected portion of text that is currently being provided on the audible output;
  
  provide visual indicia for the displayed text that corresponds to;
  
  the audio that is currently being provided on the audible output, if the recognized portion of text matches the corresponding expected portion of text; and
  
  otherwiseone or more portions of text which does not match the recognized portion of text, if the recognized portion of text does not match the corresponding expected portion of text.
- View Dependent Claims (18, 19, 20)
- - 18. The device of claim 17 wherein on the user interface when expected text is different from recognized text, a comparison result is classified into one of three types;
    - a number of linguistic units is the same between recognized and expected portions, more expected text in the mismatched portion than recognized text, and more recognized text than expected text.
  - 19. The device of claim 17 wherein if the number of linguistic units is the same the instructions generate timing information from an associated recognized word for an expected word;
    - if there is more expected text in the mismatched portion than recognized text, the instructions do not provide visual indicia for extra expected text or the instructions generate audio output with text to speech instead of playing the audio recording;
      
      if there is more recognized text than expected text, the instructions maintain highlighting active on a previous word until it is time to speak the next expected word or turn off all visual indicia while the audible output is providing the extra recognized text.
  - 20. The device of claim 17 wherein if the recognized and expected text match, the method uses timing information to make the visual indicia for the displayed text correspond to the text that is currently audible.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
T Play Holdings LLC (FIG LLC (d/b/a Fortress Investment Group LLC))
Original Assignee
K-NFB Reading Technology Inc.
Inventors
Kurzweil, Raymond C., Albrecht, Paul, Chapman, Peter, Gibson, Lucy
Primary Examiner(s)
PULLIAS, JESSE SCOTT

Application Number

US13/783,616
Publication Number

US 20130262108A1
Time in Patent Office

638 Days
Field of Search

None
US Class Current

704/235
CPC Class Codes

G09B 5/06   with both visual and audibl...

G09B 5/062   Combinations of audio and p...

G10L 15/26   Speech to text systems G10L...

Audio synchronization for document narration with user-selected playback

First Claim

8 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Audio synchronization for document narration with user-selected playback

First Claim

8 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links