Detecting the end of a user question

US 9,123,340 B2
Filed: 03/01/2013
Issued: 09/01/2015
Est. Priority Date: 03/01/2013
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

providing, by a device that includes an audio capture component that is configured to capture audio data, a video capture component that is configured to capture video data, an automated speech to text recognizer that is configured to transcribe voice inputs, and an automated natural language processing system that is configured to process natural language included in the transcriptions of the voice inputs, an answer to a first voice input from a user;

receiving, by the device, visual or audio data corresponding to a second voice input;

classifying, by the device, the second voice input as a follow on request to the first voice input or as deliberation on the answer, based on the visual data or the audio data, wherein a follow on request comprises an utterance that is directed to the natural language processing system, and wherein deliberation comprises an utterance that is not directed to the natural language processing system and is directed to one or more other people in proximity to the user; and

determining, by the device, whether to provide a response to the second voice input based on the classification of the second voice input.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for classifying voice inputs. The methods, systems, and apparatus include actions of providing an answer to a first voice input from a user and receiving visual or audio data corresponding to a second voice input. Further actions include classifying the second voice input as a follow on request to the first voice input or as deliberation on the answer, based on the visual data or the audio data. Additionally, the actions include determining whether to provide a response to the second voice input based on the classification of the second voice input.

22 Citations

19 Claims

1. A computer-implemented method comprising:
- providing, by a device that includes an audio capture component that is configured to capture audio data, a video capture component that is configured to capture video data, an automated speech to text recognizer that is configured to transcribe voice inputs, and an automated natural language processing system that is configured to process natural language included in the transcriptions of the voice inputs, an answer to a first voice input from a user;
  
  receiving, by the device, visual or audio data corresponding to a second voice input;
  
  classifying, by the device, the second voice input as a follow on request to the first voice input or as deliberation on the answer, based on the visual data or the audio data, wherein a follow on request comprises an utterance that is directed to the natural language processing system, and wherein deliberation comprises an utterance that is not directed to the natural language processing system and is directed to one or more other people in proximity to the user; and
  
  determining, by the device, whether to provide a response to the second voice input based on the classification of the second voice input.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, further comprising:
    - receiving visual or audio data corresponding to the first voice input; and
      
      entering an active listen mode based on the visual or audio data corresponding to the first voice input,wherein determining whether to provide the response to the second voice input comprises;
      
      exiting the entered active listen mode if the second voice input is classified as deliberation.
  - 3. The method of claim 1, further comprising:
    - determining an angle of the user'"'"'s head,wherein classifying the second voice input as a follow on request to the first voice input or deliberation on the answer is based on the determined angle of the user'"'"'s head.
  - 4. The method of claim 1, further comprising:
    - determining a number of people in an area that includes the user based on the visual data,wherein classifying the second voice input as a follow on request to the first voice input or deliberation on the answer is based on the determined number of people.
  - 5. The method of claim 1, further comprising:
    - determining whether the lips of the user are moving based on the visual data,wherein classifying the second voice input as a follow on request to the first voice input or deliberation on the answer is based on the determination if lips of the user are moving.
  - 6. The method of claim 1, further comprising:
    - determining words corresponding to the second voice input based on the audio data,wherein classifying the second voice input as a follow on request to the first voice input or deliberation on the answer is based on the determined words of the second voice input.
  - 7. The method of claim 1, wherein the response comprises an answer to the second voice input or a reaction to the second voice input perceivable by the user.

8. A system comprising:
- one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising;
  
  providing, by a device that includes an audio capture component that is configured to capture audio data, a video capture component that is configured to capture video data, an automated speech to text recognizer that is configured to transcribe voice inputs, and an automated natural language processing system that is configured to process natural language included in the transcriptions of the voice inputs, an answer to a first voice input from a user;
  
  receiving, by the device, visual or audio data corresponding to a second voice input;
  
  classifying, by the device, the second voice input as a follow on request to the first voice input or as deliberation on the answer, based on the visual data or the audio data, wherein a follow on request comprises an utterance that is directed to the natural language processing system, and wherein deliberation comprises an utterance that is not directed to the natural language processing system and is directed to one or more other people in proximity to the user; and
  
  determining, by the device, whether to provide a response to the second voice input based on the classification of the second voice input.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The system of claim 8, the operations further comprising:
    - receiving visual or audio data corresponding to the first voice input; and
      
      entering an active listen mode based on the visual or audio data corresponding to the first voice input,wherein determining whether to provide the response to the second voice input comprises;
      
      exiting the entered active listen mode if the second voice input is classified as deliberation.
  - 10. The system of claim 8, the operations further comprising:
    - determining an angle of the user'"'"'s head,wherein classifying the second voice input as a follow on request to the first voice input or deliberation on the answer is based on the determined angle of the user'"'"'s head.
  - 11. The system of claim 8, the operations further comprising:
    - determining a number of people in an area that includes the user based on the visual data,wherein classifying the second voice input as a follow on request to the first voice input or deliberation on the answer is based on the determined number of people.
  - 12. The system of claim 8, the operations further comprising:
    - determining whether the lips of the user are moving based on the visual data,wherein classifying the second voice input as a follow on request to the first voice input or deliberation on the answer is based on the determination if lips of the user are moving.
  - 13. The system of claim 8, the operations further comprising:
    - determining words corresponding to the second voice input based on the audio data,wherein classifying the second voice input as a follow on request to the first voice input or deliberation on the answer is based on the determined words of the second voice input.
  - 14. The system of claim 8, wherein the response comprises an answer to the second voice input or a reaction to the second voice input perceivable by the user.

15. A non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
- providing, by a device that includes an audio capture component that is configured to capture audio data, a video capture component that is configured to capture video data, an automated speech to text recognizer that is configured to transcribe voice inputs, and an automated natural language processing system that is configured to process natural language included in the transcriptions of the voice inputs, an answer to a first voice input from a user;
  
  receiving, by the device, visual or audio data corresponding to a second voice input;
  
  classifying, by the device, the second voice input as a follow on request to the first voice input or as deliberation on the answer, based on the visual data or the audio data, wherein a follow on request comprises an utterance that is directed to the natural language processing system, and wherein deliberation comprises an utterance that is not directed to the natural language processing system and is directed to one or more other people in proximity to the user; and
  
  determining, by the device, whether to provide a response to the second voice input based on the classification of the second voice input.
- View Dependent Claims (16, 17, 18, 19)
- - 16. The medium of claim 15, the operations further comprising:
    - receiving visual or audio data corresponding to the first voice input; and
      
      entering an active listen mode based on the visual or audio data corresponding to the first voice input,wherein determining whether to provide the response to the second voice input comprises;
      
      exiting the entered active listen mode if the second voice input is classified as deliberation.
  - 17. The medium of claim 15, the operations further comprising:
    - determining an angle of the user'"'"'s head,wherein classifying the second voice input as a follow on request to the first voice input or deliberation on the answer is based on the determined angle of the user'"'"'s head.
  - 18. The medium of claim 15, the operations further comprising:
    - determining a number of people in an area that includes the user based on the visual data,wherein classifying the second voice input as a follow on request to the first voice input or deliberation on the answer is based on the determined number of people.
  - 19. The medium of claim 15, the operations further comprising:
    - determining whether the lips of the user are moving based on the visual data,wherein classifying the second voice input as a follow on request to the first voice input or deliberation on the answer is based on the determination if lips of the user are moving.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Doherty, Ryan P., Johnston, Nicholas
Primary Examiner(s)
SINGH, SATWANT K

Application Number

US13/781,853
Publication Number

US 20140249811A1
Time in Patent Office

914 Days
Field of Search

None
US Class Current

1/1
CPC Class Codes

G06V 40/20   Movements or behaviour, e.g...

G10L 15/22   Procedures used during a sp...

G10L 2015/226   using non-speech characteri...

Detecting the end of a user question

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

22 Citations

19 Claims

Specification

Use Cases

Quick Links

Others

Detecting the end of a user question

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

22 Citations

19 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others