Method and system for automatically providing linguistic formulations that are outside a recognition domain of an automatic speech recognition system

US 9,224,391 B2
Filed: 02/17/2005
Issued: 12/29/2015
Est. Priority Date: 02/17/2005
Status: Active Grant

First Claim

Patent Images

1. A method for automatically providing a hypothesis of a linguistic formulation that is uttered by a user, the method comprising:

automatically providing a hypothesis of a linguistic formulation that is uttered by a user of an automatic voice service based on an automatic speech recognition system that is outside a recognition domain of said automatic speech recognition system by;

providing a constrained speech recognition and an unconstrained speech recognition of a portion of a first input speech signal that is outside a recognition domain of said automatic speech recognition system, the constrained speech recognition includes constrained phonemes based on a sequence of time segments and the unconstrained speech recognition includes unconstrained phonemes based on the sequence of time segments;

identifying and temporally segmenting a given constrained phoneme of said constrained speech recognition corresponding to a time segment of the given constrained phoneme in order to determine whether the given constrained phoneme is outside said recognition domain, including;

computing confidence measures for the constrained phonemes of said constrained speech recognition, wherein the confidence measures include a discrete time quanta, andidentifying said given constrained phoneme of said constrained speech recognition outside said recognition domain based on said confidence measures;

identifying and temporally segmenting a given unconstrained phoneme of said unconstrained speech recognition corresponding to a time segment of the given unconstrained phoneme, the time segment of the given unconstrained phoneme being substantially the same as the time segment of the given constrained phoneme; and

providing said linguistic formulation hypothesis based on said identified unconstrained phoneme of said unconstrained speech recognition.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for automatically providing a hypothesis of a linguistic formulation that is uttered by users of a voice service based on an automatic speech recognition system and that is outside a recognition domain of the automatic speech recognition system. The method includes providing a constrained and an unconstrained speech recognition from an input speech signal, identifying a part of the constrained speech recognition outside the recognition domain, identifying a part of the unconstrained speech recognition corresponding to the identified part of the constrained speech recognition, and providing the linguistic formulation hypothesis based on the identified part of the unconstrained speech recognition.

Citations

17 Claims

1. A method for automatically providing a hypothesis of a linguistic formulation that is uttered by a user, the method comprising:
- automatically providing a hypothesis of a linguistic formulation that is uttered by a user of an automatic voice service based on an automatic speech recognition system that is outside a recognition domain of said automatic speech recognition system by;
  
  providing a constrained speech recognition and an unconstrained speech recognition of a portion of a first input speech signal that is outside a recognition domain of said automatic speech recognition system, the constrained speech recognition includes constrained phonemes based on a sequence of time segments and the unconstrained speech recognition includes unconstrained phonemes based on the sequence of time segments;
  
  identifying and temporally segmenting a given constrained phoneme of said constrained speech recognition corresponding to a time segment of the given constrained phoneme in order to determine whether the given constrained phoneme is outside said recognition domain, including;
  
  computing confidence measures for the constrained phonemes of said constrained speech recognition, wherein the confidence measures include a discrete time quanta, andidentifying said given constrained phoneme of said constrained speech recognition outside said recognition domain based on said confidence measures;
  
  identifying and temporally segmenting a given unconstrained phoneme of said unconstrained speech recognition corresponding to a time segment of the given unconstrained phoneme, the time segment of the given unconstrained phoneme being substantially the same as the time segment of the given constrained phoneme; and
  
  providing said linguistic formulation hypothesis based on said identified unconstrained phoneme of said unconstrained speech recognition.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 2. The method as claimed in claim 1, wherein said confidence measures are computed for phonemes of said constrained speech recognition.
  - 3. The method as claimed in claim 1, wherein identifying said constrained phoneme of said constrained speech recognition outside said recognition domain based on said confidence measures comprises:
    - identifying phonemes of said constrained speech recognition with confidence measures that meet a given criterion.
  - 4. The method as claimed in claim 3, wherein identifying phonemes of said constrained speech recognition with confidence measures that meet a given criterion further include normalizing the constrained speech confidence measures by:
    - computing instantaneous confidence scores as a temporal average of said confidence measures within a moving window; and
      
      identifying phonemes of said constrained speech recognition with instantaneous confidence scores that meet a first relation.
  - 5. The method as claimed in claim 4, wherein each instantaneous confidence score (C_ist(t)) is computed according to the following formula:
  - 6. The method as claimed in claim 5, wherein said first relation is defined by said instantaneous confidence scores being lower than a given threshold.
  - 7. The method as claimed in claim 1, wherein identifying the unconstrained phoneme of said unconstrained speech recognition comprises:
    - identifying a phoneme of said input speech signal corresponding to said identified constrained phoneme of said constrained speech recognition; and
      
      identifying the unconstrained phoneme of said unconstrained speech recognition corresponding to said identified phoneme of said input speech signal.
  - 8. The method as claimed in claim 1, further comprising:
    - deleting any silences at the start or at the end of the said identified unconstrained phoneme of said unconstrained speech recognition.
  - 9. The method as claimed in claim 1, further comprising:
    - saving said identified unconstrained phoneme of said unconstrained speech recognition in a database of recognitions outside said recognition domain.
  - 10. The method as claimed in claim 9, wherein said identified unconstrained phoneme of said unconstrained speech recognition is saved in said database of recognitions outside said recognition domain if its length meets a second relation.
  - 11. The method as claimed in claim 10, wherein said second relation is defined by the length of said identified unconstrained phoneme of said unconstrained speech recognition being within a defined range.
  - 12. The method as claimed in claim 9, further comprising:
    - processing said database of recognitions outside said recognition domain to provide said hypothesis of linguist formulation that is outside said recognition domain.
  - 13. The method as claimed in claim 1, wherein said recognition domain comprises a recognition grammar and/or a language model.
  - 14. The method as claimed in claim 1, further including:
    - processing one or more portions of other input speech signals outside said recognition domain to determine whether the linguistic formation hypothesis of the first input signal applies; and
      
      clustering utterances uttered by one or more users of an automatic voice service based on an automatic speech recognition system that are outside a recognition domain of said automatic speech recognition system.

15. A system for automatically providing hypothesis of linguistic formulations that are uttered by users of an automatic voice service based on an automatic speech recognition system and that are outside a recognition domain of the automatic speech recognition system, said system comprising a computer configured to:
- automatically provide a hypothesis of a linguistic formulation that is uttered by a user of an automatic voice service based on an automatic speech recognition system that is outside a recognition domain of said automatic speech recognition system by;
  
  providing a constrained speech recognition and an unconstrained speech recognition of a portion of a first input speech signal that is outside a recognition domain of said automatic speech recognition system, the constrained speech recognition includes constrained phonemes based on a sequence of time segments and the unconstrained speech recognition includes unconstrained phonemes based on the sequence of time segments;
  
  identifying and temporally segmenting a given constrained phoneme of said constrained speech recognition corresponding to a time segment of the given constrained phoneme in order to determine whether the identified constrained phoneme is outside said recognition domain, including;
  
  computing confidence measures for constrained phonemes of said constrained speech recognition, wherein the confidence measures include a discrete time quanta, andidentifying said constrained phoneme of said constrained speech recognition outside said recognition domain based on said confidence measures;
  
  identifying and temporally segmenting a given unconstrained phoneme of said unconstrained speech recognition corresponding to a time segment of the given unconstrained phoneme, the time segment of the given unconstrained phoneme being substantially the same as the time segment of the given constrained phoneme; and
  
  providing said linguistic formulation hypothesis based on said identified unconstrained phoneme of said unconstrained speech recognition.

16. A non-transitory computer program medium encoded with a computer program comprising a computer program code, wherein the computer program code, when loaded in a computer, causes the computer to:
- automatically provide a hypothesis of a linguistic formulation that is uttered by a user of an automatic voice service based on an automatic speech recognition system that is outside a recognition domain of said automatic speech recognition system by;
  
  providing a constrained speech recognition and an unconstrained speech recognition of a portion of a first input speech signal that is outside a recognition domain of said automatic speech recognition system, the constrained speech recognition includes constrained phonemes based on a sequence of time segments and the unconstrained speech recognition includes unconstrained phonemes based on the sequence of time segments;
  
  identifying and temporally segmenting a given constrained phoneme of said constrained speech recognition corresponding to a constrained phoneme time segment in order to determine whether the identified constrained phoneme is outside said recognition domain, including;
  
  computing confidence measures for the constrained phonemes of said constrained speech recognition, wherein the confidence measures include a discrete time quanta, andidentifying said constrained phoneme of said constrained speech recognition outside said recognition domain based on said confidence measures;
  
  identifying and temporally segmenting a given unconstrained phoneme of said unconstrained speech recognition corresponding to time segment of the given unconstrained phoneme, the time segment of the given unconstrained phoneme being substantially the same as the time segment of the given constrained phoneme; and
  
  providing said linguistic formulation hypothesis based on said identified unconstrained phoneme of said unconstrained speech recognition.

17. A method for providing an automatic voice service based on an automatic speech recognition system, comprising:
- receiving an input speech signal;
  
  performing an automatic speech recognition based on said input speech signal; and
  
  providing a hypothesis of a linguistic formulation that is uttered by a user of said automatic voice service and that is outside a recognition domain of said automatic speech recognition system, wherein said hypothesis is automatically provided by;
  
  automatically providing a hypothesis of a linguistic formulation that is uttered by a user of an automatic voice service based on an automatic speech recognition system that is outside a recognition domain of said automatic speech recognition system by;
  
  providing a constrained speech recognition and an unconstrained speech recognition of a portion of a first input speech signal that is outside a recognition domain of said automatic speech recognition system, the constrained speech recognition includes constrained phonemes based on a sequence of time segments and the unconstrained speech recognition includes unconstrained phonemes based on the sequence of time segments;
  
  identifying and temporally segmenting a given constrained phoneme of said constrained speech recognition corresponding to a time segment of the given constrained phoneme in order to determine whether the identified constrained phoneme is outside said recognition domain, including;
  
  computing confidence measures for the constrained phonemes of said constrained speech recognition, wherein the confidence measures include a discrete time quanta, andidentifying said constrained phoneme of said constrained speech recognition outside said recognition domain based on said confidence measures;
  
  identifying and temporally segmenting a given unconstrained phoneme of said unconstrained speech recognition corresponding to a time segment of the given unconstrained phoneme, the time segment of the given unconstrained phoneme being substantially the same as the time segment of the given constrained phoneme; and
  
  providing said linguistic formulation hypothesis based on said identified unconstrained phoneme of said unconstrained speech recognition.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Colibro, Daniele, Vair, Claudio, Fissore, Luciano, Popovici, Cosmin
Primary Examiner(s)
Desir, Pierre-Louis
Assistant Examiner(s)
KOVACEK, DAVID M

Application Number

US11/884,473
Publication Number

US 20080270129A1
Time in Patent Office

3,967 Days
Field of Search

704 1- 5, 704/7, 704 9- 10, 704/231, 704270-271, 704/277, 704E17001-E17016, 704E15001-E1505
US Class Current

1/1
CPC Class Codes

G10L 15/00   Speech recognition G10L17/0...

G10L 15/06   Creation of reference templ...

G10L 15/187   Phonemic context, e.g. pron...

G10L 15/19   Grammatical context, e.g. d...

Method and system for automatically providing linguistic formulations that are outside a recognition domain of an automatic speech recognition system

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for automatically providing linguistic formulations that are outside a recognition domain of an automatic speech recognition system

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links