Detecting the end of a user question
First Claim
Patent Images
1. A computer-implemented method comprising:
- providing, by a device that includes an audio capture component that is configured to capture audio data, a video capture component that is configured to capture video data, an automated speech to text recognizer that is configured to transcribe voice inputs, and an automated natural language processing system that is configured to process natural language included in the transcriptions of the voice inputs, an answer to a first voice input from a user;
receiving, by the device, visual or audio data corresponding to a second voice input;
classifying, by the device, the second voice input as a follow on request to the first voice input or as deliberation on the answer, based on the visual data or the audio data, wherein a follow on request comprises an utterance that is directed to the natural language processing system, and wherein deliberation comprises an utterance that is not directed to the natural language processing system and is directed to one or more other people in proximity to the user; and
determining, by the device, whether to provide a response to the second voice input based on the classification of the second voice input.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for classifying voice inputs. The methods, systems, and apparatus include actions of providing an answer to a first voice input from a user and receiving visual or audio data corresponding to a second voice input. Further actions include classifying the second voice input as a follow on request to the first voice input or as deliberation on the answer, based on the visual data or the audio data. Additionally, the actions include determining whether to provide a response to the second voice input based on the classification of the second voice input.
22 Citations
19 Claims
-
1. A computer-implemented method comprising:
-
providing, by a device that includes an audio capture component that is configured to capture audio data, a video capture component that is configured to capture video data, an automated speech to text recognizer that is configured to transcribe voice inputs, and an automated natural language processing system that is configured to process natural language included in the transcriptions of the voice inputs, an answer to a first voice input from a user; receiving, by the device, visual or audio data corresponding to a second voice input; classifying, by the device, the second voice input as a follow on request to the first voice input or as deliberation on the answer, based on the visual data or the audio data, wherein a follow on request comprises an utterance that is directed to the natural language processing system, and wherein deliberation comprises an utterance that is not directed to the natural language processing system and is directed to one or more other people in proximity to the user; and determining, by the device, whether to provide a response to the second voice input based on the classification of the second voice input. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system comprising:
one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising; providing, by a device that includes an audio capture component that is configured to capture audio data, a video capture component that is configured to capture video data, an automated speech to text recognizer that is configured to transcribe voice inputs, and an automated natural language processing system that is configured to process natural language included in the transcriptions of the voice inputs, an answer to a first voice input from a user; receiving, by the device, visual or audio data corresponding to a second voice input; classifying, by the device, the second voice input as a follow on request to the first voice input or as deliberation on the answer, based on the visual data or the audio data, wherein a follow on request comprises an utterance that is directed to the natural language processing system, and wherein deliberation comprises an utterance that is not directed to the natural language processing system and is directed to one or more other people in proximity to the user; and determining, by the device, whether to provide a response to the second voice input based on the classification of the second voice input. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
15. A non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
-
providing, by a device that includes an audio capture component that is configured to capture audio data, a video capture component that is configured to capture video data, an automated speech to text recognizer that is configured to transcribe voice inputs, and an automated natural language processing system that is configured to process natural language included in the transcriptions of the voice inputs, an answer to a first voice input from a user; receiving, by the device, visual or audio data corresponding to a second voice input; classifying, by the device, the second voice input as a follow on request to the first voice input or as deliberation on the answer, based on the visual data or the audio data, wherein a follow on request comprises an utterance that is directed to the natural language processing system, and wherein deliberation comprises an utterance that is not directed to the natural language processing system and is directed to one or more other people in proximity to the user; and determining, by the device, whether to provide a response to the second voice input based on the classification of the second voice input. - View Dependent Claims (16, 17, 18, 19)
-
Specification