Systems and methods for comparing speech elements
First Claim
1. A method for comparing a first audio data source with a plurality of audio data sources, wherein the first audio data source has an utterance spoken by a first person and the plurality of audio data sources have the same utterance spoken a plurality of different times by a second person, the method comprising:
- performing a forced alignment speech recognition function on the first audio data source to isolate at least one element of the first audio data source, the forced alignment speech recognition function using an orthographic text corresponding to the first audio data source;
comparing the isolated element with a corresponding element in the plurality of audio data sources; and
determining whether the utterance spoken by the first person contained an error based on the comparison.
1 Assignment
0 Petitions
Accused Products
Abstract
A method for comparing a first audio data source with a plurality of audio data sources, wherein the first audio data source has an utterance spoken by a first person and the plurality of audio data sources have the same utterance spoken by a second person. The method includes performing a speech recognition function on the first audio data source to isolate at least one element of the first audio data source. The method also includes comparing the isolated element with a corresponding element in the plurality of audio data sources and determining whether the utterance spoken by the first person contained an error based on the comparison.
-
Citations
30 Claims
-
1. A method for comparing a first audio data source with a plurality of audio data sources, wherein the first audio data source has an utterance spoken by a first person and the plurality of audio data sources have the same utterance spoken a plurality of different times by a second person, the method comprising:
-
performing a forced alignment speech recognition function on the first audio data source to isolate at least one element of the first audio data source, the forced alignment speech recognition function using an orthographic text corresponding to the first audio data source; comparing the isolated element with a corresponding element in the plurality of audio data sources; and determining whether the utterance spoken by the first person contained an error based on the comparison. - View Dependent Claims (2, 3, 4, 5, 6, 21, 22)
-
-
7. An apparatus, comprising:
-
an audio capture module for receiving a first spoken utterance, wherein the first spoken utterance is spoken in a language that is not native to a speaker of the first spoken utterance; a speech recognition module for performing a forced alignment speech recognition function on the first spoken utterance to isolate at least one element of the first spoken utterance, the forced alignment speech recognition function using an orthographic text corresponding to the first spoken utterance; and a comparison module for comparing the isolated element of the first spoken utterance with a corresponding element in audio data from a plurality of audio data sources and for determining whether the first spoken utterance contained an error based on the comparison, wherein the audio data from a plurality of audio data sources are spoken a plurality of different times in the language, wherein the language is native to each speaker of the audio data from a plurality of audio data sources, and wherein the first spoken utterance and the audio data from a plurality of audio data sources are substantially similar in content. - View Dependent Claims (8, 9, 10, 11, 23, 24)
-
-
12. A system, comprising:
-
an audio input device; and a comparison device in communication with the audio input device, wherein the comparison device includes; an audio capture module for receiving a first spoken utterance, wherein the first spoken utterance is spoken in a language that is not native to a speaker of the first spoken utterance; a speech recognition module for performing a forced alignment speech recognition function on the first spoken utterance to isolate at least one element of the first spoken utterance, the forced alignment speech recognition function using an orthographic text corresponding to the first spoken utterance; and a comparison module for comparing the isolated element of the first spoken utterance with a corresponding element in audio data from a plurality of audio data sources and for determining whether the first spoken utterance contained an error based on the comparison, wherein the audio data from a plurality of audio data sources are spoken a plurality of different times in the language, wherein the language is native to each speaker of the audio data from a plurality of audio data sources, and wherein the first spoken utterance and the audio data from a plurality of audio data sources are substantially similar in content. - View Dependent Claims (13, 14, 15, 16, 17, 18, 25, 26)
-
-
19. An apparatus, comprising:
-
means for performing a forced alignment speech recognition function on a first audio data source to isolate at least one element of the first audio data source, wherein the first audio data source has an utterance spoken by a first person, the means for performing the forced alignment speech recognition function using an orthographic text corresponding to the utterance spoken by the first person; means for comparing the isolated element with a corresponding element in a plurality of audio data sources, wherein the plurality of audio data sources have an utterance substantially similar in content to the utterance spoken by the first person spoken a plurality of different times by a second person; and means for determining whether the utterance spoken by the first person contained an error based on the comparison. - View Dependent Claims (27, 28)
-
-
20. A computer readable medium having stored thereon instructions which, when executed by a processor, cause the processor to:
-
perform a forced alignment speech recognition function on a first audio data source to isolate at least one element of the first audio file, wherein the first audio data source has an utterance spoken by a first person, the forced alignment speech recognition function using an orthographic text corresponding to the first audio data source; compare the isolated element with a corresponding element in a plurality of audio data sources, wherein the plurality of audio data sources have an utterance substantially similar in content to the utterance spoken by the first person spoken a plurality of different times by a second person; and determine whether the utterance spoken by the first person contained an error based on the comparison. - View Dependent Claims (29, 30)
-
Specification