Methods and apparatus for identifying a non-target language in a speech recognition system
First Claim
1. A method for identifying a non-target language utterance in an audio stream, comprising the steps of:
- transcribing each utterance in said audio stream using a transcription system trained on a target language;
generating a confidence score associated with each of said transcribed utterances; and
identifying a transcribed utterance as being in a non-target language if said confidence score generated by said transcription system trained on a target language fails to meet predefined criteria.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods and apparatus are disclosed for detecting non-target language references in an audio transcription or speech recognition system using a confidence score. The confidence score may be based on (i) a probabilistic engine score provided by a speech recognition system, (ii) additional scores based on background models, or (iii) a combination of the foregoing. The engine score provided by the speech recognition system for a given input speech utterance reflects the degree of acoustic and linguistic match of the utterance with the trained target language. The background models are created or trained based on speech data in other languages, which may or may not include the target language itself. A number of types of background language models may be employed for each modeled language, including one or more of (i) prosodic models; (ii) acoustic models; (iii) phonotactic models; and (iv) keyword spotting models. The engine score can be combined with the background model scores to normalize the engine score for non-target languages. The present invention identifies a non-target language utterance within an audio stream when the confidence score falls below a predefined criteria. A language rejection mechanism can interrupt or modify the transcription process when speech in the non-target language is detected.
49 Citations
17 Claims
-
1. A method for identifying a non-target language utterance in an audio stream, comprising the steps of:
-
transcribing each utterance in said audio stream using a transcription system trained on a target language;
generating a confidence score associated with each of said transcribed utterances; and
identifying a transcribed utterance as being in a non-target language if said confidence score generated by said transcription system trained on a target language fails to meet predefined criteria. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method for identifying a non-target language utterance in an audio stream, comprising the steps of:
-
transcribing each utterance in said audio stream using a transcription system trained on a target language;
generating a confidence score associated with each of said transcribed utterances based on an engine score provided by said transcription system trained on a target language and at least one background model score; and
identifying a transcribed utterance as being in a non-target language if said confidence score generated by said transcription system trained on a target language fails to meet predefined criteria. - View Dependent Claims (9, 10, 11, 12, 13)
-
-
14. A system for identifying a non-target language utterance in an audio stream, comprising:
-
a memory that stores computer-readable code; and
a processor operatively coupled to said memory, said processor configured to implement said computer-readable code, said computer-readable code configured to;
transcribe each utterance in said audio stream using a transcription system trained on a target language;
generate a confidence score associated with each of said transcribed utterances; and
identify a transcribed utterance as being in a non-target language if said confidence score generated by said transcription system trained on a target language fails to meet predefined criteria.
-
-
15. A system for identifying a non-target language utterance in an audio stream, comprising:
-
a memory that stores computer-readable code; and
a processor operatively coupled to said memory, said processor configured to implement said computer-readable code, said computer-readable code configured to;
transcribe each utterance in said audio stream using a transcription system trained on a target language;
generate a confidence score associated with each of said transcribed utterances based on an engine score provided by said transcription system trained on a target language and at least one background model score; and
identify a transcribed utterance as being in a non-target language if said confidence score generated by said transcription system trained on a target language fails to meet predefined criteria.
-
-
16. An article of manufacture for identifying a non-target language utterance in an audio stream, comprising:
-
a computer readable medium having computer readable code means embodied thereon, said computer readable program code means comprising;
a step to transcribe each utterance in said audio stream using a transcription system trained on a target language;
a step to generate a confidence score associated with each of said transcribed utterances; and
a step to identify a transcribed utterance as being in a non-target language if said confidence score generated by said transcription system trained on a target language fails to meet predefined criteria.
-
-
17. An article of manufacture for identifying a non-target language utterance in an audio stream, comprising:
-
a computer readable medium having computer readable code means embodied thereon, said computer readable program code means comprising;
a step to transcribe each utterance in said audio stream using a transcription system trained on a target language;
a step to generate a confidence score associated with each of said transcribed utterances based on an engine score provided by said transcription system trained on a target language and at least one background model score; and
a step to identify a transcribed utterance as being in a non-target language if said confidence score generated by said transcription system trained on a target language fails to meet predefined criteria.
-
Specification