SYSTEM AND METHOD FOR ADVANCED TURN-TAKING FOR INTERACTIVE SPOKEN DIALOG SYSTEMS

US 20130060570A1
Filed: 09/01/2011
Published: 03/07/2013
Est. Priority Date: 09/01/2011
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

receiving speech; and

while continuing to receive the speech;

identifying a starting point associated with the speech;

identifying content of the speech received so far, to yield identified content;

predicting a stability of the identified content;

predicting a correctness of the identified content; and

identifying an end point associated with the speech, wherein the end point is at least one of a terminal node and a pinch node in a content lattice; and

returning, via a processor, a result based on the stability and the correctness between the starting point and the end point upon identifying the end point.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Disclosed herein are systems, methods, and non-transitory computer-readable storage media for advanced turn-taking in an interactive spoken dialog system. A system configured according to this disclosure can incrementally process speech prior to completion of the speech utterance, and can communicate partial speech recognition results upon finding particular conditions. A first condition which, if found, allows the system to communicate partial speech recognition results, is that the most recent word found in the partial results is statistically likely to be the termination of the utterance, also known as a terminal node. A second condition is the determination that all search paths within a speech lattice converge to a common node, also known as a pinch node, before branching out again. Upon finding either condition, the system can communicate the partial speech recognition results. Stability and correctness probabilities can also determine which partial results are communicated.

18 Citations

View as Search Results

20 Claims

1. A method comprising:
- receiving speech; and
  
  while continuing to receive the speech;
  
  identifying a starting point associated with the speech;
  
  identifying content of the speech received so far, to yield identified content;
  
  predicting a stability of the identified content;
  
  predicting a correctness of the identified content; and
  
  identifying an end point associated with the speech, wherein the end point is at least one of a terminal node and a pinch node in a content lattice; and
  
  returning, via a processor, a result based on the stability and the correctness between the starting point and the end point upon identifying the end point.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein the starting point is one of a beginning of the speech and a previously marked pinch node.
  - 3. The method of claim 1, wherein the stability of the identified content and the correctness of the identified content are respectively determined using stability probability and correctness probability.
  - 4. The method of claim 3, wherein the stability probability and the correctness probability are determined using a machine learning algorithm on a corpus of speech utterances.
  - 5. The method of claim 4, wherein the machine learning algorithm is a logistic regression.
  - 6. The method of claim 1, the method further comprising:
    - determining turn order between a user and an interactive turn-taking spoken dialog system based on the result.
  - 7. The method of claim 1, wherein the result comprises partial speech recognition.

8. A system comprising:
- a processor;
  
  a memory storing instructions for controlling the processor to perform steps comprising;
  
  receiving speech; and
  
  while continuing to receive the speech;
  
  identifying a starting point associated with the speech;
  
  identifying content of the speech received so far, to yield identified content;
  
  predicting a stability of the identified content;
  
  predicting a correctness of the identified content;
  
  identifying an end point associated with the speech, wherein the end point is at least one of a terminal node and a pinch node; and
  
  returning a result based on the stability and the correctness between the starting point and the end point upon identifying the end point.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The system of claim 8, wherein the starting point is one of a beginning of the speech and a previously marked pinch node.
  - 10. The system of claim 8, wherein the stability of the identified content and the correctness of the identified content are respectively determined using stability probability and correctness probability.
  - 11. The system of claim 10, wherein the stability probability and the correctness probability are determined using a machine learning algorithm on a corpus of speech utterances.
  - 12. The system of claim 11, wherein the machine learning algorithm is a logistic regression.
  - 13. The system of claim 8, the method further comprising:
    - determining turn order between a user and an interactive turn-taking spoken dialog system based on the result.
  - 14. The system of claim 8, wherein the result comprises partial speech recognition.

15. A non-transitory computer-readable storage medium storing instructions which, when executed by a computing device, cause the computing device to, the instructions comprising:
- receiving speech; and
  
  while continuing to receive the speech;
  
  identifying a starting point;
  
  identifying content of the speech received so far, to yield identified content;
  
  predicting a stability of the identified content;
  
  predicting a correctness of the identified content;
  
  identifying an end point associated with the speech, wherein the end point is at least one of a terminal node and a pinch node; and
  
  returning a result based on the stability and the correctness between the starting point and the end point upon identifying the end point.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The non-transitory computer-readable storage medium of claim 15, wherein the starting point is one of a beginning of the speech and a previously marked pinch node.
  - 17. The non-transitory computer-readable storage medium of claim 15, wherein the stability of the identified content and the correctness of the identified content are respectively determined using stability probability and correctness probability.
  - 18. The non-transitory computer-readable storage medium of claim 17, wherein the stability probability and the correctness probability are determined using a machine learning algorithm on a corpus of speech utterances.
  - 19. The non-transitory computer-readable storage medium of claim 18, wherein the machine learning algorithm is a logistic regression.
  - 20. The non-transitory computer-readable storage medium of claim 15, wherein the result comprises partial speech recognition.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
AT&T Intellectual Property I LP (AT&T, Inc.)
Inventors
WILLIAMS, Jason, Selfridge, Ethan

Granted Patent

US 8,914,288 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/251
CPC Class Codes

G10L 15/04   Segmentation; Word boundary...

G10L 15/05   Word boundary detection

G10L 15/063   Training

G10L 15/083   Recognition networks G10L15...

G10L 15/222   Barge in, i.e. overridable ...

SYSTEM AND METHOD FOR ADVANCED TURN-TAKING FOR INTERACTIVE SPOKEN DIALOG SYSTEMS

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

18 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

SYSTEM AND METHOD FOR ADVANCED TURN-TAKING FOR INTERACTIVE SPOKEN DIALOG SYSTEMS

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

18 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links