Automatic Text-Speech Mapping Tool
First Claim
1. A text-speech mapping method comprising:
- obtaining silence segments for incoming speech data;
preprocessing incoming transcript data, wherein the transcript data comprises a written document of the speech data;
finding possible candidate sentence endpoints based on the silence segments;
selecting a best match sentence endpoint based on a forced alignment score;
setting a next sentence to begin immediately after the sentence endpoint; and
repeating the finding, selecting and setting processes until all sentences for the incoming speech data are mapped.
1 Assignment
0 Petitions
Accused Products
Abstract
A text-speech mapping method. Silence segments for incoming speech data are obtained. Incoming transcript data is preprocessed. The incoming transcript data comprises a written document of the speech data. Possible candidate sentence endpoints based on the silence segments are found. A best match sentence endpoint is selected based on a forced alignment score. The next sentence is set to begin immediately after the current sentence endpoint, and the process of finding candidate sentence endpoints, selecting the best match sentence endpoint, and setting the next sentence is repeated until all sentences for the incoming speech data are mapped. The process is repeated for each mapped sentence to provide word level mapping.
37 Citations
23 Claims
-
1. A text-speech mapping method comprising:
-
obtaining silence segments for incoming speech data; preprocessing incoming transcript data, wherein the transcript data comprises a written document of the speech data; finding possible candidate sentence endpoints based on the silence segments; selecting a best match sentence endpoint based on a forced alignment score;
setting a next sentence to begin immediately after the sentence endpoint; andrepeating the finding, selecting and setting processes until all sentences for the incoming speech data are mapped. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A text-speech mapping system comprising:
-
a front end receiver to receive speech data, the front end including an acoustic module to model the speech data, wherein the acoustic module to record features of each tri-phoneme of each word in the speech data; and a voice engine to receive a transcription of the speech data and to obtain features of each tri-phoneme of each word in the transcription from a dictionary, the voice engine to determine candidate sentence and word endings for aligning the speech data with the transcription of the speech data when performing sentence level mapping and word level mapping, respectively. - View Dependent Claims (10)
-
-
11. A text-speech mapping tool comprising:
-
a front end receiver to receive speech data; a text preprocessor to receive a transcript of the speech data; a voice activity detector to determine silence segments representative of candidate sentences for the speech data; and a forced alignment mechanism to determine the best candidate sentence and to align the best candidate sentences from the speech data with sentences from the transcript of the speech data to provide sentence level mapping. - View Dependent Claims (12, 13)
-
-
14. An apparatus comprising:
-
an automatic text-speech mapping device, the automatic text-speech mapping device, the automatic text-speech mapping device including a processor and a storage device; and a machine-readable medium having stored thereon sequences of instructions, which when read by the processor via the storage device, cause the automatic text-speech mapping device to perform sentence level mapping, wherein the instructions to perform sentence level mapping include; obtaining silence segments for incoming speech data; separating incoming transcript data into sentences, wherein the transcript data comprises a written document of the speech data; finding possible candidate sentence endpoints based on the silence segments; selecting a best match sentence endpoint based on a forced alignment score;
setting a next sentence to begin immediately after the sentence endpoint; andrepeating the finding, selecting and setting processes until all sentences for the incoming speech data are mapped. - View Dependent Claims (15)
-
-
16. An article comprising:
- a storage medium having a plurality of machine accessible instructions, wherein when the instructions are executed by a processor, the instructions provide for obtaining silence segments for incoming speech data;
preprocessing incoming transcript data, wherein the transcript data comprises a written document of the speech data; finding possible candidate sentence endpoints based on the silence segments; selecting a best match sentence endpoint based on a forced alignment score;
setting a next sentence to begin immediately after the sentence endpoint; andrepeating the finding, selecting and setting processes until all sentences for the incoming speech data are mapped. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23)
- a storage medium having a plurality of machine accessible instructions, wherein when the instructions are executed by a processor, the instructions provide for obtaining silence segments for incoming speech data;
Specification