In-Call Virtual Assistants

US 20150088514A1
Filed: 09/25/2013
Published: 03/26/2015
Est. Priority Date: 09/25/2013
Status: Active Grant

First Claim

Patent Images

1. A system comprising:

one or more processors; and

one or more computer-readable media storing computer-executable instructions that, when executed by the one or more processors, cause the one or more processors to perform acts comprising;

receiving, from a first user during a voice communication established between a device of the first user and a device of a second user, a request to invoke a virtual assistant during the voice communication, the virtual assistant performing speech recognition on an audio signal representing audio of the voice communication between the first and second users upon invocation, the performing of the speech recognition for identifying a voice command from at least one of the first user or the second user;

performing speech recognition on the audio signal representing the voice communication between the first and second users for identifying a voice command at least partly in response to receiving the request;

identifying a voice command from the audio of the voice communication between the first and second users responsive to the performing of the speech recognition;

performing a task corresponding to the voice command at least partly in response to the identifying of the voice command; and

providing an output audio signal to at least one of the device of the first user or the device of the second user during the voice communication, the output audio signal configured to cause audible output associated with the performing of the task on at least one of the device of the user or the device of the second user.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques for providing virtual assistants to assist users during a voice communication between the users. For instance, a first user operating a device may establish a voice communication with respective devices of one or more additional users, such as with a device of a second user. For instance, the first user may utilize her device to place a telephone call to the device of the second user. A virtual assistant may also join the call and, upon invocation by a user on the call, may identify voice commands from the call and may perform corresponding tasks for the users in response.

307 Citations

24 Claims

1. A system comprising:
- one or more processors; and
  
  one or more computer-readable media storing computer-executable instructions that, when executed by the one or more processors, cause the one or more processors to perform acts comprising;
  
  receiving, from a first user during a voice communication established between a device of the first user and a device of a second user, a request to invoke a virtual assistant during the voice communication, the virtual assistant performing speech recognition on an audio signal representing audio of the voice communication between the first and second users upon invocation, the performing of the speech recognition for identifying a voice command from at least one of the first user or the second user;
  
  performing speech recognition on the audio signal representing the voice communication between the first and second users for identifying a voice command at least partly in response to receiving the request;
  
  identifying a voice command from the audio of the voice communication between the first and second users responsive to the performing of the speech recognition;
  
  performing a task corresponding to the voice command at least partly in response to the identifying of the voice command; and
  
  providing an output audio signal to at least one of the device of the first user or the device of the second user during the voice communication, the output audio signal configured to cause audible output associated with the performing of the task on at least one of the device of the user or the device of the second user.
- View Dependent Claims (2, 3, 4, 5)
- - 2. A system as recited in claim 1, wherein the request comprises the first user stating a predefined utterance, and the acts further comprise monitoring the audio signal representing the audio for the predefined utterance without performing speech recognition on the audio signal representing the audio of the voice communication between the first and second users prior to identifying the predefined utterance.
  - 3. A system as recited in claim 1, wherein:
    - the voice command comprises a request for information;
      
      the performing of the task comprises locating the requested information; and
      
      the providing of the output audio signal comprises providing an output audio signal configured to cause the virtual assistant to state, to the first user and the second user, the requested information.
  - 4. A system as recited in claim 1, wherein the voice communication comprises a conference call that connects the device of the first user with the device of the second user and with the system associated with the virtual assistant.
  - 5. A system as recited in claim 1, wherein the system that is associated with the virtual assistant comprises a telephony service that establishes the voice communication between the first and second users.

6. A method comprising:
- receiving a request to invoke a virtual assistant during a voice communication between a device of a first user and a device of a second user;
  
  performing speech recognition on an audio signal representing audio of the voice communication at least partly in response to the receiving;
  
  responsive to the performing of the speech recognition, identifying a voice command from at least one of the first user or the second user; and
  
  providing, by the virtual assistant, an output audio signal to at least one of the device of the first user or the device of the second user, the output audio signal for outputting audible content during the voice communication.
- View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
- - 7. A method as recited in claim 6, wherein the request comprises the first user or the second user stating a predefined utterance, and further comprising identifying the predefined utterance from within the audio signal representing the audio of the voice communication, the performing of the speech recognition occurring at least partly in response to the identifying of the predefined utterance.
  - 8. A method as recited in claim 6, wherein the receiving of the request to invoke the virtual assistant comprises at least one of:
    - receiving an indication of an incoming phone call from the device of the first user or the device of the second user;
      
      orreceiving an indication that the first user or the second user has activated a physical button or a soft button of a respective device.
  - 9. A method as recited in claim 6, the acts further comprising joining a device hosting the virtual assistant to the voice communication between the device of the first user and the device of the second user as part of establishing the voice communication between the device of the first user and the device of the second user.
  - 10. A method as recited in claim 6, the acts further comprising joining a device hosting the virtual assistant to the voice communication between the device of the first user and the device of the second user as part of establishing the voice communication between the device of the first user and the device of the second user, and wherein the virtual assistant is not invoked until the receiving of the request.
  - 11. A method as recited in claim 6, the acts further comprising:
    - receiving a request from the first user or the second user that the virtual assistant join the voice communication after establishing the voice communication between the device of the first user and the device of the second user; and
      
      joining the virtual assistant and at least partly in response to the receiving of the request from the first user or the second user that the virtual assistant join the voice communication.
  - 12. A method as recited in claim 11, wherein the requesting that the virtual assistant join the voice communication comprises the first user or the second user dialing a telephone number associated with the virtual assistant.
  - 13. A method as recited in claim 6, the acts further comprising identifying a user that provided the voice command, and wherein the audible content that is output is based at least in part on the identifying of the user.
  - 14. A method as recited in claim 13, wherein the identifying of the user comprises:
    - referencing at least one of an automatic number identification (ANI) indicating a telephone number associated with a device that initiated the voice communication or a called party number (CPN) indicating a telephone number associated with a device that received a request to establish the voice communication; and
      
      mapping the at least one of the ANI or the CPN to an associated user.
  - 15. A method as recited in claim 13, wherein the identifying of the user comprises comparing at least one of frequency, amplitude, pitch, or another audio characteristic of speech of the first user or the second user to one or more pre-stored voice signatures.
  - 16. A method as recited in claim 13, further comprising requesting that the user that provided the voice command authenticate with the virtual assistant.
  - 17. A method as recited in claim 16, wherein the requesting that the user authenticate with the virtual assistant comprises at least one of:
    - communicating with the user via a communication channel other than the voice communication;
      
      orcommunicating with the user via the voice communication, the virtual assistant muting the voice communication at the device of the user that did not provide the voice command while the virtual assistant communicates, via the voice communication, with the user that provided the voice command.
  - 18. A method as recited in claim 6, wherein the virtual assistant resides at least partly on the device of the first user, the device of the second user, or a computing device that is remote from both the device of the first user and the device of the second user.

19. One or more computer-readable media storing computer-executable instructions that, when executed by one or more processors, cause the one or more processors to perform acts comprising:
- joining a computing device to a voice communication between two user devices, the computing device being remote from the two user devices;
  
  upon a user of one of the two user devices invoking the computing device, performing speech recognition by the computing device on an audio signal representing audio of the voice communication; and
  
  identifying a voice command from a user of one of the two devices responsive to performing the speech recognition on the audio signal representing the audio.
- View Dependent Claims (20, 21, 22, 23, 24)
- - 20. One or more computer-readable media as recited in claim 19, the acts further comprising performing a task corresponding to the voice command at least partly in response to the identifying of the voice command.
  - 21. One or more computer-readable media as recited in claim 19, the acts further comprising providing an output audio signal effective to output audible content on the voice communication and to at least one of the two user devices at least partly in response to the identifying of the voice command or at least partly in response to performing a task corresponding to the voice command.
  - 22. One or more computer-readable media as recited in claim 19, wherein the joining occurs automatically upon one of the two user devices placing the voice communication to the other of the two user devices.
  - 23. One or more computer-readable media as recited in claim 19, wherein the joining occurs upon one of the two user devices initiating a conference call using a telephone number associated with the computing device after the two user devices establish the voice communication.
  - 24. One or more computer-readable media as recited in claim 19, wherein the voice communication comprises a communication over a public switched telephone network (PSTN), a cellular network, or a voice-over-internet-protocol (VoIP) network.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Rawles, LLC (Amazon.com, Inc.)
Inventors
Typrin, Marcello

Granted Patent

US 10,134,395 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/249
CPC Class Codes

G06F 3/167   Audio in a user interface, ...

G10L 15/22   Procedures used during a sp...

G10L 17/00   Speaker identification or v...

G10L 2015/223   Execution procedure of a sp...

H04M 2203/355   Interactive dialogue design...

H04M 2203/357   Autocues for dialog assistance

H04M 3/493   Interactive information ser...

In-Call Virtual Assistants

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

307 Citations

24 Claims

Specification

Use Cases

Quick Links

Others

In-Call Virtual Assistants

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

307 Citations

24 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others