Full-duplex utterance processing in a natural language virtual assistant

US 10,311,875 B2
Filed: 12/22/2016
Issued: 06/04/2019
Est. Priority Date: 12/22/2016
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method of generating a response to a spoken input, the method comprising:

obtaining an audio input stream;

detecting in the audio input stream a beginning of a first utterance;

detecting in the audio input stream an end of the first utterance;

responsive to detecting the end of the first utterance, initiating processing of the first utterance to recognize a first query;

while processing the first utterance;

continuing to receive the audio input stream; and

detecting a beginning of a second utterance in the audio input stream,executing the first query to determine a first response;

detecting an end of the second utterance in the audio input stream;

processing the second utterance to recognize a second query;

identifying a serial dependency between the first query and the second query;

responsive to the identification of the serial dependency, delaying execution of the second query until the executing of the first query has completed;

executing the second query to determine a second response; and

outputting the second response;

wherein detecting ends of utterances comprises identifying at least one of;

a pause in speech in the audio input stream, non-speech in the audio input stream, or a user input event performed by a user.

View all claims

10 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A query-processing system processes an input audio stream that represents a succession of queries spoken by a user. The query-processing system listens continuously to the input audio stream, parses queries and takes appropriate actions in mid-stream. In some embodiments, the system processes queries in parallel, limited by serial constraints. In some embodiments, the system parses and executes queries while a previous query'"'"'s execution is still in progress. To accommodate users who tend to speak slowly and express a thought in separate parts, the query-processing system halts the outputting of results corresponding to a previous query if it detects that a new speech utterance modifies the meaning of the previous query.

Citations

8 Claims

1. A computer-implemented method of generating a response to a spoken input, the method comprising:
- obtaining an audio input stream;
  
  detecting in the audio input stream a beginning of a first utterance;
  
  detecting in the audio input stream an end of the first utterance;
  
  responsive to detecting the end of the first utterance, initiating processing of the first utterance to recognize a first query;
  
  while processing the first utterance;
  
  continuing to receive the audio input stream; and
  
  detecting a beginning of a second utterance in the audio input stream,executing the first query to determine a first response;
  
  detecting an end of the second utterance in the audio input stream;
  
  processing the second utterance to recognize a second query;
  
  identifying a serial dependency between the first query and the second query;
  
  responsive to the identification of the serial dependency, delaying execution of the second query until the executing of the first query has completed;
  
  executing the second query to determine a second response; and
  
  outputting the second response;
  
  wherein detecting ends of utterances comprises identifying at least one of;
  
  a pause in speech in the audio input stream, non-speech in the audio input stream, or a user input event performed by a user.
- View Dependent Claims (2, 3, 4)
- - 2. The computer-implemented method of claim 1, wherein processing the second utterance is performed concurrently with executing the first query.
  - 3. The computer-implemented method of claim 1, wherein outputting the first response is performed concurrently with executing the second query.
  - 4. The computer-implemented method of claim 1, wherein the first response is output visually, the method further comprising:
    - determining whether a display period has elapsed since the first response was output visually; and
      
      clearing the visual output of the response after the elapse of the display period.

5. A non-transitory computer-readable storage medium storing instructions for generating a response to a spoken input, the instructions when executed by a computer processor performing actions comprising:
- obtaining an audio input stream;
  
  detecting in the audio input stream a beginning of a first utterance;
  
  detecting in the audio input stream an end of the first utterance;
  
  responsive to detecting the end of the first utterance, initiating processing of the first utterance to recognize a first query;
  
  while processing the first utterance;
  
  continuing to receive the audio input stream; and
  
  detecting a beginning of a second utterance in the audio input stream,executing the first query to determine a first response;
  
  detecting an end of the second utterance in the audio input stream;
  
  processing the second utterance to recognize a second query;
  
  identifying a serial dependency between the first query and the second query;
  
  responsive to the identification of the serial dependency, delaying execution of the second query until the executing of the first query has completed;
  
  executing the second query to determine a second response; and
  
  outputting the second response;
  
  wherein detecting ends of utterances comprises identifying at least one of;
  
  a pause in speech in the audio input stream, or non-speech in the audio input stream.
- View Dependent Claims (6, 7, 8)
- - 6. The non-transitory computer-readable storage medium of claim 5, wherein processing the second utterance is performed concurrently with executing the first query.
  - 7. The non-transitory computer-readable storage medium of claim 5, wherein outputting the first response is performed concurrently with executing the second query.
  - 8. The non-transitory computer-readable storage medium of claim 5, wherein the first response is output visually, the actions further comprising:
    - determining whether a display period has elapsed since the first response was output visually; and
      
      clearing the visual output of the response after the elapse of the display period.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Soundhound AI IP LLC (SoundHound AI, Inc. (f/k/a Archimedes Tech SPAC Partners Co.))
Original Assignee
SoundHound, Inc. (SoundHound AI, Inc. (f/k/a Archimedes Tech SPAC Partners Co.))
Inventors
Halstvedt, Scott, Mont-Reynaud, Bernard, Wadud, Kazi Asif
Primary Examiner(s)
Saint Cyr, Leonard

Application Number

US15/389,122
Publication Number

US 20180182398A1
Time in Patent Office

894 Days
Field of Search

704 7- 10
US Class Current
CPC Class Codes

G06F 16/2455   Query execution

G06F 16/3329   Natural language query form...

G06F 16/3331   Query processing

G06F 16/90335   Query processing

G06F 40/35   Discourse or dialogue repre...

G10L 15/22   Procedures used during a sp...

G10L 15/222   Barge in, i.e. overridable ...

G10L 15/30   Distributed recognition, e....

G10L 2015/223   Execution procedure of a sp...

G10L 25/87   Detection of discrete point...

Full-duplex utterance processing in a natural language virtual assistant

First Claim

10 Assignments

0 Petitions

Accused Products

Abstract

Citations

8 Claims

Specification

Solutions

Use Cases

Quick Links

Full-duplex utterance processing in a natural language virtual assistant

First Claim

10 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

8 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links