In-call virtual assistants

US 10,134,395 B2
Filed: 09/25/2013
Issued: 11/20/2018
Est. Priority Date: 09/25/2013
Status: Active Grant

First Claim

Patent Images

1. A system comprising:

one or more processors; and

one or more computer-readable media storing computer-executable instructions that, when executed by the one or more processors, cause the one or more processors to perform acts comprising;

receiving an indication that a first device of a first user and a second device of a second user are exchanging first voice-communication data;

establishing a connection between the first device, the second device, and a computing device hosting at least a portion of a virtual assistant based at least in part on the indication;

receiving, by the computing device, the first voice-communication data exchanged between the first device and the second device, the computing device being remote from both the first device and the second device, wherein the computing device is configured to communicate with the first device over one or more networks and configured to communicate with the second device over the one or more networks;

performing speech recognition on a first part of a first audio signal to generate first text, the first audio signal representing first audio of the first voice-communication data exchanged between the first device and the second device;

identifying a predefined utterance in the first text;

invoking the virtual assistant based at least in part on identifying the predefined utterance;

receiving identity information associated with the first device;

transmitting, to the first device and at least partially in response to invoking the virtual assistant, information indicating that second voice-communication data will not be transmitted to the second device;

transmitting, to the first device, a request for a password, the request based at least in part on the identity information;

receiving the second voice-communication data from the first device including a representation of the password;

preventing, based at least in part on the request for the password, the second voice-communication data from being transmitted to the second device;

determining that the representation of the password is associated with an identity of the first user;

receiving third voice-communication data exchanged between the first device and the second device, the third voice-communication data received subsequent to the first voice-communication data and the second voice-communication data;

at least partly in response to identifying the predefined utterance, performing speech recognition on a second audio signal to generate second text, the second audio signal representing second audio of the third voice-communication data exchanged between the first device and the second device;

identifying a voice command in the second text, the voice command being separate from and occurring after the predefined utterance;

performing a task corresponding to the voice command at least partly in response to identifying the voice command; and

sending an output audio signal to at least one of the first device or the second device, the output audio signal configured to cause audible output associated with the performing of the task on at least one of the first device or the second device.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques for providing virtual assistants to assist users during a voice communication between the users. For instance, a first user operating a device may establish a voice communication with respective devices of one or more additional users, such as with a device of a second user. For instance, the first user may utilize her device to place a telephone call to the device of the second user. A virtual assistant may also join the call and, upon invocation by a user on the call, may identify voice commands from the call and may perform corresponding tasks for the users in response.

107 Citations

View as Search Results

20 Claims

1. A system comprising:
- one or more processors; and
  
  one or more computer-readable media storing computer-executable instructions that, when executed by the one or more processors, cause the one or more processors to perform acts comprising;
  
  receiving an indication that a first device of a first user and a second device of a second user are exchanging first voice-communication data;
  
  establishing a connection between the first device, the second device, and a computing device hosting at least a portion of a virtual assistant based at least in part on the indication;
  
  receiving, by the computing device, the first voice-communication data exchanged between the first device and the second device, the computing device being remote from both the first device and the second device, wherein the computing device is configured to communicate with the first device over one or more networks and configured to communicate with the second device over the one or more networks;
  
  performing speech recognition on a first part of a first audio signal to generate first text, the first audio signal representing first audio of the first voice-communication data exchanged between the first device and the second device;
  
  identifying a predefined utterance in the first text;
  
  invoking the virtual assistant based at least in part on identifying the predefined utterance;
  
  receiving identity information associated with the first device;
  
  transmitting, to the first device and at least partially in response to invoking the virtual assistant, information indicating that second voice-communication data will not be transmitted to the second device;
  
  transmitting, to the first device, a request for a password, the request based at least in part on the identity information;
  
  receiving the second voice-communication data from the first device including a representation of the password;
  
  preventing, based at least in part on the request for the password, the second voice-communication data from being transmitted to the second device;
  
  determining that the representation of the password is associated with an identity of the first user;
  
  receiving third voice-communication data exchanged between the first device and the second device, the third voice-communication data received subsequent to the first voice-communication data and the second voice-communication data;
  
  at least partly in response to identifying the predefined utterance, performing speech recognition on a second audio signal to generate second text, the second audio signal representing second audio of the third voice-communication data exchanged between the first device and the second device;
  
  identifying a voice command in the second text, the voice command being separate from and occurring after the predefined utterance;
  
  performing a task corresponding to the voice command at least partly in response to identifying the voice command; and
  
  sending an output audio signal to at least one of the first device or the second device, the output audio signal configured to cause audible output associated with the performing of the task on at least one of the first device or the second device.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. A system as recited in claim 1, wherein:
    - the voice command comprises a request for information;
      
      the performing of the task comprises locating, as located information, the information; and
      
      the sending of the output audio signal comprises sending a signal configured to cause the virtual assistant to state, to the first user and the second user, the located information.
  - 3. A system as recited in claim 1, wherein the first device, the second device, and the computing device are connected via a conference call.
  - 4. A system as recited in claim 1, wherein the computing device is associated with a telephony service that establishes voice communication between the first user and the second user.
  - 5. A system as recited in claim 1, the second audio associated with the third voice-communication data including a first part and a second part, the second text corresponding to the first part of the second audio signal, the acts further comprising, after sending the output audio signal to the at least one of the first device or the second device, refraining from performing speech recognition on the second part of the second audio signal, the second part being subsequent to the first part.
  - 6. A system as recited in claim 1, wherein the information indicating that second voice-communication data will not be transmitted to the second device includes at least data generated using text to speech.
  - 7. A system as recited in claim 1, wherein the identity information associated with the first device is based at least in part on a telephone number associated with the first device, a uniform resource identifier associated with the first device, a voice over internet protocol (VoIP) identifier associated with the first device, or a session initiation protocol (SIP) identifier associated with the first device.

8. A method comprising:
- at a computing device hosting at least a portion of a virtual assistant;
  
  receiving first voice-communication data exchanged between a first device of a first user and a second device of a second user, the computing device being remote from both the first device and the second device;
  
  performing speech recognition on a first audio signal to generate first text, the first audio signal representing first audio of the first voice-communication data exchanged between the first device and the second device;
  
  identifying a predefined utterance in the first text;
  
  invoking the virtual assistant based at least in part on identifying the predefined utterance;
  
  receiving identity information associated with the first device;
  
  transmitting, to the first device and at least partially in response to invoking the virtual assistant, information indicating that second voice-communication data will not be transmitted to the second device;
  
  transmitting a request for a password to the first device, the request based at least in part on the identity information;
  
  receiving the second voice-communication data from the first device including a representation of the password;
  
  preventing, based at least in part on the request for the password, the second voice-communication data from being transmitted to the second device;
  
  determining that the representation of the password is associated with an identity of the first user;
  
  receiving third voice-communication data exchanged between the first device and the second device, the third voice-communication data received subsequent to the first voice-communication data and the second voice-communication data;
  
  at least partly in response to identifying the predefined utterance in the first text, performing speech recognition on a second audio signal to generate second text, the second audio signal representing second audio of the third voice-communication data exchanged between the first device and the second device;
  
  identifying, from the second text, a voice command uttered by at least one of the first user or the second user, the voice command being separate from and occurring after the predefined utterance;
  
  performing a task corresponding to the voice command at least partly in response to identifying the voice command; and
  
  sending, over one or more networks, an output audio signal to the first device and the second device, wherein the output audio signal is associated with the performing the task corresponding to the voice command.
- View Dependent Claims (9, 10, 11, 12, 13, 14, 15)
- - 9. A method as recited in claim 8, further comprising identifying a user that provided the voice command, and wherein audible content associated with the output audio signal is based at least in part on the identifying the user.
  - 10. A method as recited in claim 9, wherein the identifying the user comprises:
    - referencing at least one of an automatic number identification (ANI) indicating a telephone number associated with the first device that initiated a voice communication including at least the first voice-communication data or a called party number (CPN) indicating a telephone number associated with the second device that received a request to establish the voice communication.
  - 11. A method as recited in claim 9, wherein the identifying the user comprises comparing at least one of frequency, amplitude, pitch, or another audio characteristic of speech of the first user or the second user to one or more pre-stored voice signatures.
  - 12. A method as recited in claim 9, wherein the transmitting the request for the password to the first device is based at least in part on determining that the first user uttered the predefined utterance to invoke the virtual assistant.
  - 13. A method as recited in claim 8, further comprising the virtual assistant communicating with the first user via the second voice-communication data while preventing fourth voice-communication data associated with the second device from being transmitted to the first device.
  - 14. A method as recited in claim 8, the second audio associated with the third voice-communication data including a first part and a second part, the second text corresponding to the first part of the second audio signal, the method further comprising, after sending the output audio signal to the first device and the second device, refraining from performing speech recognition on the second part of the second audio signal, the second part being subsequent to the first part.
  - 15. A method as recited in claim 8, further comprising converting, as the information indicating that the second voice-communication data will not be transmitted to the second device, text to speech, the text corresponding to the information.

16. One or more non-transitory computer-readable media storing computer-executable instructions that, when executed by one or more processors, cause the one or more processors to perform acts comprising:
- receiving an indication that two user devices are exchanging first voice-communication data;
  
  receiving, by a computing device, the first voice-communication data exchanged between the two user devices, the computing device being remote from the two user devices, the two user devices including a first device and a second device;
  
  performing speech recognition by the computing device on a first audio signal representing first audio of the first voice-communication data to identify a predefined utterance;
  
  invoking a virtual assistant based at least in part on the predefined utterance;
  
  receiving identity information associated with the first device;
  
  transmitting, to the first device and at least partially in response to invoking the virtual assistant, information indicating that second voice-communication data will not be transmitted to the second device;
  
  transmitting, to the first device, a request for a password, the request based at least in part on the identity information;
  
  receiving the second voice-communication data from the first device including a representation of the password;
  
  preventing, based at least in part on the request for the password, the second voice-communication data from being transmitted to the second device;
  
  determining that the representation of the password is associated with an identity of a first user associated with the first device;
  
  receiving third voice-communication data exchanged between the first device and the second device, the third voice-communication data received subsequent to the first voice-communication data and the second voice-communication data;
  
  performing speech recognition on a second audio signal to generate text, the second audio signal representing second audio of the third voice-communication data exchanged between the first device and the second device;
  
  identifying, from the text, a voice command from a user of one of the two user devices, the voice command being separate from and occurring after the predefined utterance;
  
  performing a task corresponding to the voice command at least partly in response to identifying the voice command; and
  
  sending, over one or more networks, an output audio signal to the two user devices, wherein the output audio signal is associated with the performing the task corresponding to the voice command.
- View Dependent Claims (17, 18, 19, 20)
- - 17. One or more non-transitory computer-readable media as recited in claim 16, the acts further comprising performing the task corresponding to the voice command at least partly in response to the identifying of the voice command.
  - 18. One or more non-transitory computer-readable media as recited in claim 16, the acts further comprising sending, to at least one of the two user devices at least partly in response to the identifying of the voice command or at least partly in response to performing the task corresponding to the voice command, the output audio signal effective to output audible content.
  - 19. One or more non-transitory computer-readable media as recited in claim 16, wherein at least the first voice-communication data is exchanged over a public switched telephone network (PSTN), a cellular network, or a voice-over-internet-protocol (VoIP) network.
  - 20. One or more non-transitory computer-readable media as recited in claim 16, the acts further comprising converting, as the information indicating that the second voice-communication data will not be transmitted to the second device, text to speech, the text corresponding to the information.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Typrin, Marcello
Primary Examiner(s)
Dorvil, Richemond
Assistant Examiner(s)
Blankenagel, Bryan S

Application Number

US14/037,077
Publication Number

US 20150088514A1
Time in Patent Office

1,882 Days
Field of Search
US Class Current
CPC Class Codes

G06F 3/167   Audio in a user interface, ...

G10L 15/22   Procedures used during a sp...

G10L 17/00   Speaker identification or v...

G10L 2015/223   Execution procedure of a sp...

H04M 2203/355   Interactive dialogue design...

H04M 2203/357   Autocues for dialog assistance

H04M 3/493   Interactive information ser...

In-call virtual assistants

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

107 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

In-call virtual assistants

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

107 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links