Methods and apparatus for identifying a non-target language in a speech recognition system

US 6,738,745 B1
Filed: 04/07/2000
Issued: 05/18/2004
Est. Priority Date: 04/07/2000
Status: Expired due to Term

First Claim

Patent Images

1. A method for identifying a non-target language utterance in an audio stream, comprising the steps of:

transcribing each utterance in said audio stream using a transcription system trained on a target language;

generating a confidence score associated with each of said transcribed utterances; and

identifying a transcribed utterance as being in a non-target language if said confidence score generated by said transcription system trained on a target language fails to meet predefined criteria.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods and apparatus are disclosed for detecting non-target language references in an audio transcription or speech recognition system using a confidence score. The confidence score may be based on (i) a probabilistic engine score provided by a speech recognition system, (ii) additional scores based on background models, or (iii) a combination of the foregoing. The engine score provided by the speech recognition system for a given input speech utterance reflects the degree of acoustic and linguistic match of the utterance with the trained target language. The background models are created or trained based on speech data in other languages, which may or may not include the target language itself. A number of types of background language models may be employed for each modeled language, including one or more of (i) prosodic models; (ii) acoustic models; (iii) phonotactic models; and (iv) keyword spotting models. The engine score can be combined with the background model scores to normalize the engine score for non-target languages. The present invention identifies a non-target language utterance within an audio stream when the confidence score falls below a predefined criteria. A language rejection mechanism can interrupt or modify the transcription process when speech in the non-target language is detected.

49 Citations

View as Search Results

17 Claims

1. A method for identifying a non-target language utterance in an audio stream, comprising the steps of:
- transcribing each utterance in said audio stream using a transcription system trained on a target language;
  
  generating a confidence score associated with each of said transcribed utterances; and
  
  identifying a transcribed utterance as being in a non-target language if said confidence score generated by said transcription system trained on a target language fails to meet predefined criteria.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein said confidence score is an engine score generated by said transcription system.
  - 3. The method of claim 1, further comprising the step of interrupting said transcription system when said non-target language is detected.
  - 4. The method of claim 1, further comprising the step of modifying said transcription system when said non-target language is detected.
  - 5. The method of claim 1, wherein said confidence score is based on one or more background models trained on at least one non-target language.
  - 6. The method of claim 5, wherein said background models include one or more of (i) prosodic models;
    - (ii) acoustic models;
      
      (iii) phonotactic models; and
      
      (iv) keyword spotting models for each modeled language.
  - 7. The method of claim 1, wherein said confidence score is based on an engine score provided by said transcription system combined with a background model score to normalize said engine score for said non-target language.

8. A method for identifying a non-target language utterance in an audio stream, comprising the steps of:
- transcribing each utterance in said audio stream using a transcription system trained on a target language;
  
  generating a confidence score associated with each of said transcribed utterances based on an engine score provided by said transcription system trained on a target language and at least one background model score; and
  
  identifying a transcribed utterance as being in a non-target language if said confidence score generated by said transcription system trained on a target language fails to meet predefined criteria.
- View Dependent Claims (9, 10, 11, 12, 13)
- - 9. The method of claim 8, further comprising the step of interrupting said transcription system when said non-target language is detected.
  - 10. The method of claim 8, further comprising the step of modifying said transcription system when said non-target language is detected.
  - 11. The method of claim 8, wherein said at least one background model is trained on at least one non-target language.
  - 12. The method of claim 11, wherein said at least one background model includes one or more of (i) prosodic models;
    - (ii) acoustic models;
      
      (iii) phonotactic models; and
      
      (iv) keyword spotting models for each modeled language.
  - 13. The method of claim 8, wherein said confidence score normalizes said engine score for said non-target language.

14. A system for identifying a non-target language utterance in an audio stream, comprising:
- a memory that stores computer-readable code; and
  
  a processor operatively coupled to said memory, said processor configured to implement said computer-readable code, said computer-readable code configured to;
  
  transcribe each utterance in said audio stream using a transcription system trained on a target language;
  
  generate a confidence score associated with each of said transcribed utterances; and
  
  identify a transcribed utterance as being in a non-target language if said confidence score generated by said transcription system trained on a target language fails to meet predefined criteria.

15. A system for identifying a non-target language utterance in an audio stream, comprising:
- a memory that stores computer-readable code; and
  
  a processor operatively coupled to said memory, said processor configured to implement said computer-readable code, said computer-readable code configured to;
  
  transcribe each utterance in said audio stream using a transcription system trained on a target language;
  
  generate a confidence score associated with each of said transcribed utterances based on an engine score provided by said transcription system trained on a target language and at least one background model score; and
  
  identify a transcribed utterance as being in a non-target language if said confidence score generated by said transcription system trained on a target language fails to meet predefined criteria.

16. An article of manufacture for identifying a non-target language utterance in an audio stream, comprising:
- a computer readable medium having computer readable code means embodied thereon, said computer readable program code means comprising;
  
  a step to transcribe each utterance in said audio stream using a transcription system trained on a target language;
  
  a step to generate a confidence score associated with each of said transcribed utterances; and
  
  a step to identify a transcribed utterance as being in a non-target language if said confidence score generated by said transcription system trained on a target language fails to meet predefined criteria.

17. An article of manufacture for identifying a non-target language utterance in an audio stream, comprising:
- a computer readable medium having computer readable code means embodied thereon, said computer readable program code means comprising;
  
  a step to transcribe each utterance in said audio stream using a transcription system trained on a target language;
  
  a step to generate a confidence score associated with each of said transcribed utterances based on an engine score provided by said transcription system trained on a target language and at least one background model score; and
  
  a step to identify a transcribed utterance as being in a non-target language if said confidence score generated by said transcription system trained on a target language fails to meet predefined criteria.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
International Business Machines Corporation
Inventors
Viswanathan, Mahesh, Navratil, Jiri
Primary Examiner(s)
Dorvil, Richemond
Assistant Examiner(s)
Storm, Donald L.

Application Number

US09/544,678
Time in Patent Office

1,502 Days
Field of Search

704/240, 704/277, 704/231, 704/257, 704/8, 704/3
US Class Current

704/277
CPC Class Codes

G10L 15/20 Speech recognition techniqu...

Methods and apparatus for identifying a non-target language in a speech recognition system

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

49 Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Methods and apparatus for identifying a non-target language in a speech recognition system

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

49 Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links