SYSTEM AND METHOD FOR ADVANCED TURN-TAKING FOR INTERACTIVE SPOKEN DIALOG SYSTEMS
First Claim
1. A method comprising:
- receiving speech; and
while continuing to receive the speech;
identifying a starting point associated with the speech;
identifying content of the speech received so far, to yield identified content;
predicting a stability of the identified content;
predicting a correctness of the identified content; and
identifying an end point associated with the speech, wherein the end point is at least one of a terminal node and a pinch node in a content lattice; and
returning, via a processor, a result based on the stability and the correctness between the starting point and the end point upon identifying the end point.
3 Assignments
0 Petitions
Accused Products
Abstract
Disclosed herein are systems, methods, and non-transitory computer-readable storage media for advanced turn-taking in an interactive spoken dialog system. A system configured according to this disclosure can incrementally process speech prior to completion of the speech utterance, and can communicate partial speech recognition results upon finding particular conditions. A first condition which, if found, allows the system to communicate partial speech recognition results, is that the most recent word found in the partial results is statistically likely to be the termination of the utterance, also known as a terminal node. A second condition is the determination that all search paths within a speech lattice converge to a common node, also known as a pinch node, before branching out again. Upon finding either condition, the system can communicate the partial speech recognition results. Stability and correctness probabilities can also determine which partial results are communicated.
18 Citations
20 Claims
-
1. A method comprising:
-
receiving speech; and while continuing to receive the speech; identifying a starting point associated with the speech; identifying content of the speech received so far, to yield identified content; predicting a stability of the identified content; predicting a correctness of the identified content; and identifying an end point associated with the speech, wherein the end point is at least one of a terminal node and a pinch node in a content lattice; and returning, via a processor, a result based on the stability and the correctness between the starting point and the end point upon identifying the end point. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system comprising:
-
a processor; a memory storing instructions for controlling the processor to perform steps comprising; receiving speech; and while continuing to receive the speech; identifying a starting point associated with the speech; identifying content of the speech received so far, to yield identified content; predicting a stability of the identified content; predicting a correctness of the identified content; identifying an end point associated with the speech, wherein the end point is at least one of a terminal node and a pinch node; and returning a result based on the stability and the correctness between the starting point and the end point upon identifying the end point. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A non-transitory computer-readable storage medium storing instructions which, when executed by a computing device, cause the computing device to, the instructions comprising:
-
receiving speech; and while continuing to receive the speech; identifying a starting point; identifying content of the speech received so far, to yield identified content; predicting a stability of the identified content; predicting a correctness of the identified content; identifying an end point associated with the speech, wherein the end point is at least one of a terminal node and a pinch node; and returning a result based on the stability and the correctness between the starting point and the end point upon identifying the end point. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification