Use of a digital assistant in communications
First Claim
1. A device, comprising:
- one or more processors;
a display that supports a user interface (UI) for interacting with a user of the device; and
a memory device storing computer-readable instructions which, when executed by the one or more processors, perform a method comprising the steps of;
listening in on an audio portion of a video call between local and remote parties,entering a listening mode by which listening to speech of the local party in the audio portion subsequent to a key word or key phrase being spoken is enabled,determining an action that is responsive to the speech, the determining including locating applicable context and utilizing the located applicable context,making an announcement of the determined action by injecting the announcement into the audio portion of the video call so that both the local and remote parties can hear the announcement,taking the determined action,returning to the listening mode; and
generating an overlay that is included in an outgoing video stream from the device, the overlay including a representation of interactions between the local party and a digital assistant or the overlay including a representation of status of the digital assistant.
2 Assignments
0 Petitions
Accused Products
Abstract
A digital assistant operating on a device is configured to be engaged as an active participant in communications between local and remote parties by listening to voice and video calls and participating in messaging sessions. The digital assistant typically can be initiated by voice using a key word or phrase and then be requested to perform tasks, provide information and services, etc. using voice or gestures. The digital assistant can respond to the request and take appropriate actions. In voice and video calls, the interactions with the digital assistant (i.e., the request, response, and actions) can be heard by both parties to the call as if the digital assistant was a third party on the call. In a messaging session, messages are generated and displayed to each participant so that they can see the interactions with the digital assistant as if it was a participant.
35 Citations
5 Claims
-
1. A device, comprising:
-
one or more processors; a display that supports a user interface (UI) for interacting with a user of the device; and a memory device storing computer-readable instructions which, when executed by the one or more processors, perform a method comprising the steps of; listening in on an audio portion of a video call between local and remote parties, entering a listening mode by which listening to speech of the local party in the audio portion subsequent to a key word or key phrase being spoken is enabled, determining an action that is responsive to the speech, the determining including locating applicable context and utilizing the located applicable context, making an announcement of the determined action by injecting the announcement into the audio portion of the video call so that both the local and remote parties can hear the announcement, taking the determined action, returning to the listening mode; and generating an overlay that is included in an outgoing video stream from the device, the overlay including a representation of interactions between the local party and a digital assistant or the overlay including a representation of status of the digital assistant. - View Dependent Claims (2, 3)
-
-
4. A device, comprising:
-
one or more processors; a display that supports a user interface (UI) for interacting with a user of the device; and a memory device storing computer-readable instructions which, when executed by the one or more processors, perform a method comprising the steps of; listening in on an audio portion of a video call between local and remote parties, entering a listening mode by which listening to speech of the local party in the audio portion subsequent to a key word or key phrase being spoken is enabled, determining an action that is responsive to the speech, the determining including locating applicable context and utilizing the located applicable context, making an announcement of the determined action by injecting the announcement into the audio portion of the video call so that both the local and remote parties can hear the announcement, taking the determined action, returning to the listening mode, and naming parties in the video call to whom the determined action is applicable.
-
-
5. A device, comprising:
-
one or more processors; a display that supports a user interface (UI) for interacting with a user of the device; and a memory device storing computer-readable instructions which, when executed by the one or more processors, perform a method comprising the steps of; listening in on an audio portion of a video call between local and remote parties, entering a listening mode by which listening to speech of the local party in the audio portion subsequent to a key word or key phrase being spoken is enabled, determining an action that is responsive to the speech, the determining including locating applicable context and utilizing the located applicable context, using data provided by a remote service when making the action determination, or the action determination being made at least in part by an external service that operates substantially remotely from the device, making an announcement of the determined action by injecting the announcement into the audio portion of the video call so that both the local and remote parties can hear the announcement, taking the determined action, and returning to the listening mode.
-
Specification