Audio synchronization for document narration with user-selected playback
First Claim
Patent Images
1. A computer implemented method comprising:
- applying speech recognition by one or more computer systems to an audio recording to generate a text version of recognized portions of text;
determining by the one or more computer systems an elapsed time period from a reference time in the audio recording to each portion of text in the recognized portions of text;
comparing by the one or more computer systems a recognized portion of text to an expected portion of text;
determining by the one or more computer systems a number of syllables or phonemes in a sequence of expected words that are part of the expected portion of text;
determining by the one or more computer systems a corresponding recognized portion comprising a sequence of recognized words, the sequence of expected words and sequence of recognized words having a same number of syllables or phonemes and a different number of words;
determining by the one or more computer systems an elapsed time for the corresponding recognized portion;
storing the determined elapsed time in a timing file that is stored on a computer-readable storage device, the timing file further comprising the elapsed time information for each expected portion of text;
receiving from a user an indication of a user-selected portion of text;
determining by the one or more computers an elapsed time in the audio recording by referencing the timing file associated with the user-selected portion of text; and
providing an audible output corresponding to the audio in the audio recording at the determined elapsed time in the audio recording.
8 Assignments
0 Petitions
Accused Products
Abstract
Disclosed are techniques and systems to provide a narration of a text. In some aspects, the techniques and systems described herein include generating a timing file that includes elapsed time information for expected portions of text that provides an elapsed time period from a reference time in an audio recording to each portion of text in recognized portions of text.
16 Citations
18 Claims
-
1. A computer implemented method comprising:
-
applying speech recognition by one or more computer systems to an audio recording to generate a text version of recognized portions of text; determining by the one or more computer systems an elapsed time period from a reference time in the audio recording to each portion of text in the recognized portions of text; comparing by the one or more computer systems a recognized portion of text to an expected portion of text; determining by the one or more computer systems a number of syllables or phonemes in a sequence of expected words that are part of the expected portion of text; determining by the one or more computer systems a corresponding recognized portion comprising a sequence of recognized words, the sequence of expected words and sequence of recognized words having a same number of syllables or phonemes and a different number of words; determining by the one or more computer systems an elapsed time for the corresponding recognized portion; storing the determined elapsed time in a timing file that is stored on a computer-readable storage device, the timing file further comprising the elapsed time information for each expected portion of text; receiving from a user an indication of a user-selected portion of text; determining by the one or more computers an elapsed time in the audio recording by referencing the timing file associated with the user-selected portion of text; and providing an audible output corresponding to the audio in the audio recording at the determined elapsed time in the audio recording. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer program product tangibly stored on a computer readable storage device, the computer program product comprising instructions for causing a processor to:
-
apply speech recognition to an audio recording to generate a text version of recognized portions of text; determine an elapsed time period from a reference time in the audio recording to each portion of text in the recognized portions of text; compare a recognized portion of text to an expected portion of text; determine a number of syllables or phonemes in a sequence of expected words that are part of the expected portion of text; determine a corresponding recognized portion comprising a sequence of recognized words, the sequence of expected words and sequence of recognized words having a same number of syllables or phonemes and a different number of words; determine an elapsed time for the corresponding recognized portion; store the determined elapsed time in a timing file that is stored on a computer-readable storage device, the timing file further comprising the elapsed time information for each expected portion of text; receive an indication of a user-selected portion of text; determine an elapsed time in the audio recording by referencing the timing file associated with the user-selected portion of text; and provide an audible output corresponding the audio in the audio recording at the determined elapsed time in the audio recording. - View Dependent Claims (9, 10, 11, 12, 13)
-
-
14. A system comprising:
-
a memory; and a computing device configured to; apply speech recognition to an audio recording to generate a text version of recognized portions of text; determine an elapsed time period from a reference time in the audio recording to each portion of text in the recognized portions of text; compare a recognized portion of text to an expected portion of text; determine a number of syllables or phonemes in a sequence of expected words that are part of the expected portion of text; determine a corresponding recognized portion comprising a sequence of recognized words, the sequence of expected words and sequence of recognized words having a same number of syllables or phonemes and a different number of words; determine an elapsed time for the corresponding recognized portion; store the determined elapsed time in a timing file that is stored on a computer-readable storage device, the timing file further comprising the elapsed time information for each expected portion of text; receive an indication of a user-selected portion of text; determine an elapsed time in the audio recording by referencing the timing file associated with the user-selected portion of text; and provide an audible output corresponding the audio in the audio recording at the determined elapsed time in the audio recording. - View Dependent Claims (15, 16, 17, 18)
-
Specification