DISTINGUISHABLE OPEN SOUNDS
First Claim
1. A non-transitory computer readable medium storing code that, when executed by one or more processors would cause the one or more processors to:
- receive input indicative of a selection of one of a plurality of distinguishable open sounds to be used for indicating that a system is receptive to a user query;
capture audio through a microphone;
digitize the audio into audio samples;
perform sound spotting using a neural network algorithm on the audio samples, the neural network trained for a specific wake-up phrase;
in response to the neural network spotting the specific wake-up phrase, receive speech input through the microphone, the speech input including an audible user query;
further in response to spotting the specific wake-up phrase, read an open sound audio segment, corresponding to the selection, from a storage device; and
output, through a speaker, the open sound audio segment indicating that a system is receptive to capturing the user'"'"'s speech,wherein the user is able to distinguish between at least two speech enabled devices within a shared audible environment.
9 Assignments
0 Petitions
Accused Products
Abstract
Systems for speech enabling devices perform methods of configuring distinct open sounds for different devices to indicate to users when each device is recognizing speech. Open sounds are stored both on computer-readable media within a device and on server systems to which devices interface over networks. Open sounds are a parameter of device personalities, and can be configured by system designers, users, or service providers. Devices detect the presence of others by spotting known open phrases, and provide distinctiveness by changing their selected open phrase. Server system providers analyze non-verbal and spoken phrase open sounds from developers using audio fingerprinting and speech recognition.
97 Citations
14 Claims
-
1. A non-transitory computer readable medium storing code that, when executed by one or more processors would cause the one or more processors to:
-
receive input indicative of a selection of one of a plurality of distinguishable open sounds to be used for indicating that a system is receptive to a user query; capture audio through a microphone; digitize the audio into audio samples; perform sound spotting using a neural network algorithm on the audio samples, the neural network trained for a specific wake-up phrase; in response to the neural network spotting the specific wake-up phrase, receive speech input through the microphone, the speech input including an audible user query; further in response to spotting the specific wake-up phrase, read an open sound audio segment, corresponding to the selection, from a storage device; and output, through a speaker, the open sound audio segment indicating that a system is receptive to capturing the user'"'"'s speech, wherein the user is able to distinguish between at least two speech enabled devices within a shared audible environment. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A non-transitory computer readable medium storing code that, when executed by one or more processors would cause the one or more processors to:
-
receive a client request for an open sound selected from a plurality of distinguishable open sounds, the open sound to be used as an indication that the client is receptive to a user'"'"'s query; according to an indication of which of the plurality of open sounds, read a corresponding open sound audio segment; and transmit the open sound audio segment to the client; capture audio through a microphone; digitize the audio into audio samples; perform sound spotting on the audio samples to detect a specific wake-up phrase; in response to detecting the specific wake-up phrase, output the open sound audio segment, through a speaker, indicating that the client is receptive to capturing the user'"'"'s speech, wherein the user is able to distinguish between at least two speech enabled devices within a shared audible environment. - View Dependent Claims (9, 10, 11, 12, 13)
-
-
14. A natural language virtual assistant server system enabled to:
-
receive and store at least one domain-specific natural language grammar from a first developer; receive and store at least one open sound selected from a plurality of distinguishable open sounds from the first developer; receive and store at least one domain-specific natural language grammar from a second developer; receive and store at least one open sound selected from the plurality of distinguishable open sounds from the second developer, the at least one open sound of the first developer being distinguishably different from the at least one open sound of the second developer; read and transmit the first open sound to a first device, the first device having a first wake-up phrase; and read and transmit the second open sound to a second device; capture audio through a first microphone of the first device and through a second microphone of the second device; digitize the audio into an audio sample; perform sound spotting on the audio sample at the first device and the second device, to determine if there is a match between the audio sample and at least one of the first wake-up phrase and the second wake-up phrase; and in response to determining a match between the audio sample and at least one of the first wake-up phrase and the second wake-up phrase, activate one of the first device and the second device to output through that device'"'"'s speaker the corresponding open sound indicating that the corresponding device is receptive to capturing speech, wherein a user is able to distinguish between the first device and the second device within a shared audible environment based on the device'"'"'s corresponding open sound.
-
Specification