Turn-taking confidence

US 7,809,569 B2
Filed: 12/22/2005
Issued: 10/05/2010
Est. Priority Date: 12/22/2004
Status: Expired due to Fees

First Claim

Patent Images

1. A method for managing interactive dialog between a machine and a user comprising:

providing audio output comprising speech to the user from the machine, said audio output comprising a sequence of one or more phrases, wherein each phrase is followed by a yield zone, said yield zone characterized by an absence of speech provided from the machine;

receiving digitized audio data comprising speech audio input at the machine wherein said speech audio input is generated from the user or from an environment of the user;

determining said audio input comprises speech audio input generated from the user;

determining a time at which said speech audio input begins;

determining an onset likelihood value based on the time wherein the onset likelihood has a first value if the time occurs during a given phrase associated with the one or more phrases and a second value if the time occurs during a given yield zone associated with the one or more yield zones;

determining a confidence value from the audio input, wherein the confidence value is dependent upon the onset likelihood value and a recognition result from a speech recognition module; and

providing an audio response from the machine to the user based on the confidence value.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for managing interactive dialog between a machine and a user is claimed. In one embodiment, an interaction between the machine and the user is managed by determining at least one likelihood value which is dependent upon a possible speech onset of the user. In another embodiment, the likelihood value can be dependent a model of a desire of the user for specific items, a model of an attention of the user to specific items, or a model of turn-taking cues. Further, the likelihood value can be utilized in a voice activity system.

Citations

6 Claims

1. A method for managing interactive dialog between a machine and a user comprising:
- providing audio output comprising speech to the user from the machine, said audio output comprising a sequence of one or more phrases, wherein each phrase is followed by a yield zone, said yield zone characterized by an absence of speech provided from the machine;
  
  receiving digitized audio data comprising speech audio input at the machine wherein said speech audio input is generated from the user or from an environment of the user;
  
  determining said audio input comprises speech audio input generated from the user;
  
  determining a time at which said speech audio input begins;
  
  determining an onset likelihood value based on the time wherein the onset likelihood has a first value if the time occurs during a given phrase associated with the one or more phrases and a second value if the time occurs during a given yield zone associated with the one or more yield zones;
  
  determining a confidence value from the audio input, wherein the confidence value is dependent upon the onset likelihood value and a recognition result from a speech recognition module; and
  
  providing an audio response from the machine to the user based on the confidence value.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, wherein the onset likelihood value diminishes as a function of time for the given yield zone relative to a starting time of the given yield zone.
  - 3. The method of claim 1, wherein the onset likelihood value increases as a function of time for the given phrase relative to an end of the given phrase.
  - 4. The method of claim 1, wherein the given yield zone comprises a first portion and a second portion, and wherein the corresponding onset likelihood value is lower when said time occurs in said second portion than when said time occurs in said first portion.
  - 5. The method of claim 1, wherein determining said time at which the voice activity detector detects said audio input comprises detecting speech generated from the user, wherein detection of said audio input is based on a minimum duration of said speech from said user.
  - 6. The method of claim 1, wherein the time occurs during the given phrase, said given phrase comprises a pre-hold portion, a hold portion, and a post hold portion, and wherein the first value of the onset likelihood value is determined based on whether the time occurs during the pre-hold portion of the given phrase, the hold portion of the given phrase, or a post-hold portion of the given phrase.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Shadow Prompt Technology AG
Original Assignee
Enterprise Integration Group, Inc. (Perspecta, Inc.)
Inventors
Attwater, David, Balentine, Bruce
Primary Examiner(s)
Han; Qi

Application Number

US11/317,391
Publication Number

US 20060206329A1
Time in Patent Office

1,748 Days
Field of Search

704/210, 704/215, 704/231, 704/232, 704/233, 704/235, 704/239, 704/240, 704/246, 704/245, 704/256, 704/256.6, 704/275, 704/276, 704/277, 704/E17.002, 704/E17.003, 704/E17.007, 704/E17.009, 704/E17.01, 704/E17.011, 704/E17.015, 704/E17.016, 704/E15.002, 704/E15.003, 704/E15.004, 704/E15.008, 704/E15.009, 704/E15.011, 704/E15.012, 704/E15.014, 704/E15.015, 704/E15.039, 704/E15.04, 704/E15.041, 704/E15.042, 704/257, 704/251, 704/270, 704/270.1
US Class Current

704/257
CPC Class Codes

G10L 15/08   Speech classification or se...

G10L 15/22   Procedures used during a sp...

G10L 15/24   Speech recognition using no...

Turn-taking confidence

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

Citations

6 Claims

Specification

Solutions

Use Cases

Quick Links

Turn-taking confidence

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

6 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links