User recognition for speech processing systems
First Claim
1. A computer-implemented method comprising:
- receiving, from a speech-controlled device, input audio data corresponding to an utterance;
performing automatic speech recognition (ASR) on the input audio data to determine input text data;
performing natural language understanding (NLU) on the input text data to determine NLU results data;
determining a user profile associated with the speech-controlled device, the user profile associated with first training data representing how a first user'"'"'s voice sounds and second training data representing how a second user'"'"'s voice sounds;
determining, using at least the input audio data and the first training data, a first user recognition confidence score corresponding to a likelihood that the utterance was spoken by the first user;
determining a first device to receive first text data from, the first text data being responsive to a first portion of the NLU results data;
receiving a request from the first device for the first user recognition confidence score;
sending, to the first device, a speech session identifier and the first user recognition confidence score, wherein the speech session identifier corresponds to the utterance;
receiving the first text data from the first device; and
sending, to the speech-controlled device, output data corresponding to the first text data.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems, methods, and devices for recognizing a user are disclosed. A speech-controlled device captures a spoken utterance, and sends audio data corresponding thereto to a server. The server determines content sources storing or having access to content responsive to the spoken utterance. The server also determines multiple users associated with a profile of the speech-controlled device. Using the audio data, the server may determine user recognition data with respect to each user indicated in the speech-controlled device'"'"'s profile. The server may also receive user recognition confidence threshold data from each of the content sources. The server may determine user recognition data associated that satisfies (i.e., meets or exceeds) a most stringent (i.e., highest) of the user recognition confidence threshold data. Thereafter, the server may send data indicating a user associated with the user recognition data to all of the content sources.
-
Citations
20 Claims
-
1. A computer-implemented method comprising:
-
receiving, from a speech-controlled device, input audio data corresponding to an utterance; performing automatic speech recognition (ASR) on the input audio data to determine input text data; performing natural language understanding (NLU) on the input text data to determine NLU results data; determining a user profile associated with the speech-controlled device, the user profile associated with first training data representing how a first user'"'"'s voice sounds and second training data representing how a second user'"'"'s voice sounds; determining, using at least the input audio data and the first training data, a first user recognition confidence score corresponding to a likelihood that the utterance was spoken by the first user; determining a first device to receive first text data from, the first text data being responsive to a first portion of the NLU results data; receiving a request from the first device for the first user recognition confidence score; sending, to the first device, a speech session identifier and the first user recognition confidence score, wherein the speech session identifier corresponds to the utterance; receiving the first text data from the first device; and sending, to the speech-controlled device, output data corresponding to the first text data. - View Dependent Claims (2, 3)
-
-
4. A system comprising:
-
at least one processor; and memory including instructions that, when executed, cause the at least one processor to; receive, from a device, input audio data corresponding to an utterance; perform automatic speech recognition (ASR) on the input audio data to create input text data; perform natural language understanding (NLU) on the input text data to create NLU results data; determine a user profile associated with the device, the user profile associated with first user-specific data corresponding to how a first user'"'"'s voice sounds; determine, using at least the first user-specific data, a first user recognition score corresponding to a likelihood that the utterance was spoken by the first user; and send, to at least one remote device, first data corresponding to the first user recognition score. - View Dependent Claims (5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A computer-implemented method comprising:
-
receiving, from a device, input audio data corresponding to an utterance; performing automatic speech recognition (ASR) on the input audio data to create input text data; performing natural language understanding (NLU) on the input text data to create NLU results data; determining a user profile associated with the device, the user profile associated with first user-specific data corresponding to a first user; determining, using at least the first user-specific data, a first user recognition score corresponding to a likelihood that the utterance was spoken by the first user; and sending, to at least one remote device, first data corresponding to the first user recognition score. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
-
Specification