Mixing digitized speech and text using reliability indices
First Claim
1. A method of processing an original data stream of digitized speech samples, comprising:
- converting a stream of digitized speech samples to a stream of text and associated reliability measures, the reliability measures indicating a level of confidence in the correctness of the speech to text conversion of the associated portions of the stream of text; and
creating a mixed-media data stream comprising the stream of text as a text component and selected portions of the digitized stream of speech as a speech component, each selected portion corresponding to a portion of the stream of text having a reliability measure below a threshold.
1 Assignment
0 Petitions
Accused Products
Abstract
Methods and apparatus of processing, storing and transmitting an original data stream of digitized speech samples. The method converts a stream of digitized speech samples to a stream of text and associated reliability measures. A mixed-media data stream is created with the stream of text as a text component and selected portions of the digitized stream of speech as a speech component. The selected portions are those whose corresponding reliability measures fall below a threshold. The threshold can be changed to change the amount of storage or bandwidth used by the mixed-media data stream. The mixed-media data stream can be searched and the results can be spoken as synthetic speech derived form the text component or as speech samples taken from the digitized speech component.
107 Citations
30 Claims
-
1. A method of processing an original data stream of digitized speech samples, comprising:
-
converting a stream of digitized speech samples to a stream of text and associated reliability measures, the reliability measures indicating a level of confidence in the correctness of the speech to text conversion of the associated portions of the stream of text; and creating a mixed-media data stream comprising the stream of text as a text component and selected portions of the digitized stream of speech as a speech component, each selected portion corresponding to a portion of the stream of text having a reliability measure below a threshold. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A computer program, residing on a computer-readable medium, comprising instructions for causing a computer to:
-
convert a stream of digitized speech samples to a stream of text and associated reliability measures, the reliability measures indicating a level of confidence in the correctness of the speech to text conversion of the associated portions of the stream of text; and create a mixed-media data stream comprising the stream of text as a text component and selected portions of the digitized stream of speech as a speech component, each selected portion corresponding to a portion of the stream of text having a reliability measure below a threshold. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. A method of processing a mixed-media data stream, comprising:
-
reading a mixed-media data stream comprising a text component and a digitized speech component, the text component comprising text segments identified in the mixed-media data stream as doubtful and the digitized speech component comprising digitized speech corresponding to the text segments identified as doubtful; converting the text not identified as doubtful to synthetic speech; and generating an audio waveform from the mixed-media data stream by combining the digitized speech and the synthetic speech. - View Dependent Claims (24, 25, 26)
-
-
27. Apparatus comprising a computer-readable storage medium tangibly embodying program instructions for presenting information in spoken form, the program instructions including instructions operable for causing a programmable processor to:
-
read a mixed-media data stream comprising a text component and a digitized speech component, the text component comprising text segments identified in the mixed-media data stream as doubtful and the digitized speech component comprising digitized speech corresponding to the text segments identified as doubtful; convert the text not identified as doubtful to synthetic speech; and generate an audio waveform from the mixed-media data stream by combining the digitized speech and the synthetic speech. - View Dependent Claims (28, 29, 30)
-
Specification