System and method for synchronized text display and audio playback
First Claim
1. An audio processing system for providing synchronized display of recognized text from an original audio file containing speech spoken by a user and playback of the original audio file, said system comprising:
- (a) a speech recognition module for generating recognized text pieces and associated audio pieces from the original audio file;
(b) a silence insertion module for aggregating the audio pieces into an aggregated audio file;
(c) a silence detection module for converting the original audio file and the aggregated audio file into a silence detected original audio file and a silence detected aggregated audio file, wherein silent and non-silent groups are identified using a threshold volume;
(d) said silence insertion module further being adapted to;
(i) compare the silence detected original audio file with the silence detected aggregated audio file and determine the differences in position of the non-silence group within the respective files;
(ii) insert silence within the audio pieces according to the differences in position determined in (i) to create silence inserted audio pieces, such that aggregation of the silence inserted audio pieces results in an aggregated silence inserted audio pieces file that substantially corresponds to the original audio file; and
(iii) utilize the characteristics of the silence inserted audio pieces and the associated recognized text pieces to synchronize the display of the recognized text pieces from the original audio file and the playback of the associated audio pieces from the original audio file.
4 Assignments
0 Petitions
Accused Products
Abstract
An audio processing system and method for providing synchronized display of recognized text from an original audio file and playback of the original audio file. The system includes a speech recognition module, a silence insertion module, and a silence detection module. The speech recognition module generates text and audio pieces. The silence insertion module, aggregates the audio pieces into an aggregated audio file. The silence detection module converts the original audio file and the aggregated audio file into silence detected versions. Silent and non-silent blocks are identified using a threshold volume. The silence insertion module compares the silence detected original and aggregated audio files, determines the differences in position of non-silence elements and inserts silence within the audio pieces accordingly. The characteristics of the silence inserted audio pieces are used to synchronize the display of recognized text from an original audio file and playback of original audio file.
73 Citations
20 Claims
-
1. An audio processing system for providing synchronized display of recognized text from an original audio file containing speech spoken by a user and playback of the original audio file, said system comprising:
-
(a) a speech recognition module for generating recognized text pieces and associated audio pieces from the original audio file; (b) a silence insertion module for aggregating the audio pieces into an aggregated audio file; (c) a silence detection module for converting the original audio file and the aggregated audio file into a silence detected original audio file and a silence detected aggregated audio file, wherein silent and non-silent groups are identified using a threshold volume; (d) said silence insertion module further being adapted to; (i) compare the silence detected original audio file with the silence detected aggregated audio file and determine the differences in position of the non-silence group within the respective files; (ii) insert silence within the audio pieces according to the differences in position determined in (i) to create silence inserted audio pieces, such that aggregation of the silence inserted audio pieces results in an aggregated silence inserted audio pieces file that substantially corresponds to the original audio file; and (iii) utilize the characteristics of the silence inserted audio pieces and the associated recognized text pieces to synchronize the display of the recognized text pieces from the original audio file and the playback of the associated audio pieces from the original audio file. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. An audio processing method for providing synchronized display of recognized text from an original audio file containing speech spoken by a user and playback of the original audio file, said method comprising:
-
(a) recognizing the spoken speech within the original audio file and generating recognized text pieces and associated audio pieces; (b) aggregating the audio pieces into an aggregated audio file; (c) applying silence detection to convert the original audio file and the aggregated audio file into a silence detected original audio file and a silence detected aggregated audio file, wherein silent and non-silent groups are identified using a threshold volume; (d) comparing the silence detected original audio file with the silence detected aggregated audio file and determining the differences in position of corresponding non-silence groups within the silence detected original audio file and the silence detected aggregated audio file; (e) inserting silence within the audio pieces according to the differences in position of corresponding non-silence groups within the silence detected original audio file and the silence detected aggregated audio file to create silence inserted audio pieces, such that aggregation of the silence inserted audio pieces results in an aggregated silence inserted audio pieces file that substantially corresponds to the original audio file; and (f) utilizing the characteristics of the silence inserted audio pieces and the associated recognized text pieces to synchronize the display of recognized text from an original audio file and playback of original audio file. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification