Proactive assistance based on dialog communication between devices
First Claim
1. A non-transitory computer-readable medium storing instructions for providing proactive assistance based on dialog communication between devices, the instructions, when executed by one or more processors, cause the one or more processors to:
- while voice communication is established between an electronic device and a second electronic device;
receive a stream of audio data associated with the second electronic device;
identify, based on at least one sentence boundary, a plurality of portions of the stream of audio data;
store the plurality of portions of the stream of audio data;
detect a user input;
in response to detecting the user input, generate a text representation of speech contained in a first portion of the plurality of portions of the stored audio data;
determine whether the text representation contains information corresponding to one of a plurality of types of information;
in response to determining that the text representation contains information corresponding to one of a plurality of types of information, determine whether the information is complete;
in response to determining that the information is not complete;
generate a text representation of speech contained in a second portion of the plurality of portions of the stored audio data; and
obtain second information from the second portion of the plurality of portions of the stored audio data;
perform one or more tasks based on at least the information and the second information.
0 Assignments
0 Petitions
Accused Products
Abstract
Systems and processes for proactive assistance based on dialog communication between devices are provided. In one example process, while voice communication between an electronic device and a second electronic device is established, a stream of audio data associated with the second electronic device can be received. In response to detecting a user input, a text representation of speech contained in a portion of the stream of audio data can be generated. The process can determine whether the text representation contains information corresponding to one of a plurality of types of information. In response to determining that the text representation contains information corresponding to one of a plurality of types of information, one or more tasks based on the information can be performed.
2573 Citations
16 Claims
-
1. A non-transitory computer-readable medium storing instructions for providing proactive assistance based on dialog communication between devices, the instructions, when executed by one or more processors, cause the one or more processors to:
while voice communication is established between an electronic device and a second electronic device; receive a stream of audio data associated with the second electronic device; identify, based on at least one sentence boundary, a plurality of portions of the stream of audio data; store the plurality of portions of the stream of audio data; detect a user input; in response to detecting the user input, generate a text representation of speech contained in a first portion of the plurality of portions of the stored audio data; determine whether the text representation contains information corresponding to one of a plurality of types of information; in response to determining that the text representation contains information corresponding to one of a plurality of types of information, determine whether the information is complete; in response to determining that the information is not complete; generate a text representation of speech contained in a second portion of the plurality of portions of the stored audio data; and obtain second information from the second portion of the plurality of portions of the stored audio data; perform one or more tasks based on at least the information and the second information. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
15. An electronic device, comprising:
-
one or more processors; a memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for; while voice communication is established between the electronic device and a second electronic device; receiving a stream of audio data associated with the second electronic device; identifying, based on at least one sentence boundary, a plurality of portions of the stream of audio data; storing the plurality of portions of the stream of audio data; detecting a user input; in response to detecting the user input, generating a text representation of speech contained in a first portion of the plurality of portions of the stored audio data; determining whether the text representation contains information corresponding to one of a plurality of types of information; in response to determining that the text representation contains information corresponding to one of a plurality of types of information, determining whether the information is complete; in response to determining that the information is not complete; generating a text representation of speech contained in a second portion of the plurality of portions of the stored audio data; and obtaining second information from the second portion of the plurality of portions of the stored audio data; performing one or more tasks based on at least the information and the second information.
-
-
16. A method, comprising:
at an electronic device with one or more processors and memory; while voice communication is established between the electronic device and a second electronic device; receiving a stream of audio data associated with the second electronic device; identifying, based on at least one sentence boundary, a plurality of portions of the stream of audio data; storing the plurality of portions of the stream of audio data; detecting a user input; in response to detecting the user input, generating a text representation of speech contained in a first portion of the plurality of portions of the stored audio data; determining whether the text representation contains information corresponding to one of a plurality of types of information; in response to determining that the text representation contains information corresponding to one of a plurality of types of information, determining whether the information is complete; in response to determining that the information is not complete; generating a text representation of speech contained in a second portion of the plurality of portions of the stored audio data; and obtaining second information from the second portion of the plurality of portions of the stored audio data; performing one or more tasks based on at least the information and the second information.
Specification