Voice-assisted scanning
First Claim
1. A system comprising:
- one or more processors;
memory; and
one or more computer-executable instructions stored in the memory and executable by the one or more processors to;
receive, from a handheld electronic device, voice data and item identifier information, wherein the handheld electronic device includes at least a microphone to receive a voice input from a user and a scanner to scan an identifier of an item;
determine, based at least in part on the item identifier information, information about the item;
generate one or more transcriptions of the voice data using a speech recognition model;
generate a semantic representation of the one or more transcriptions using a natural language understanding model;
identify a reference to the item in the semantic representation;
identify a user intent in the semantic representation;
determine an action based at least in part on the information about the item, the reference to the item in the semantic representation, and the user intent in the semantic representation; and
perform the action.
1 Assignment
0 Petitions
Accused Products
Abstract
In some cases, a handheld device that includes a microphone and a scanner may be used for voice-assisted scanning. For example, a user may provide a voice input via the microphone and may activate the scanner to scan an item identifier (e.g., a barcode). The handheld device may communicate voice data and item identifier information to a remote system for voice-assisted scanning. The remote system may perform automatic speech recognition (ASR) operations on the voice data and may perform item identification operations based on the scanned identifier. Natural language understanding (NLU) processing may be improved by combining ASR information with item information obtained based on the scanned identifier. An action may be executed based on the likely user intent.
-
Citations
21 Claims
-
1. A system comprising:
-
one or more processors; memory; and one or more computer-executable instructions stored in the memory and executable by the one or more processors to; receive, from a handheld electronic device, voice data and item identifier information, wherein the handheld electronic device includes at least a microphone to receive a voice input from a user and a scanner to scan an identifier of an item; determine, based at least in part on the item identifier information, information about the item; generate one or more transcriptions of the voice data using a speech recognition model; generate a semantic representation of the one or more transcriptions using a natural language understanding model; identify a reference to the item in the semantic representation; identify a user intent in the semantic representation; determine an action based at least in part on the information about the item, the reference to the item in the semantic representation, and the user intent in the semantic representation; and perform the action. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A computer-implemented method comprising:
-
receiving, via a network, voice data and an identifier of an item, wherein the identifier of the item was obtained by scanning the item; determining, based at least in part on the identifier of the item, information about the item; generating a semantic representation of the voice data using at least one of a speech recognition model or a natural language understanding model; identifying a reference to the item in the semantic representation; identifying a user intent in the semantic representation; and performing an action that is determined based at least in part on the information about the item, the reference to the item in the semantic representation, and the user intent in the semantic representation. - View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14, 21)
-
-
15. A method comprising:
-
receiving, from a handheld electronic device, voice data associated with a voice input from a user and an identifier of an item, wherein the identifier of the item was obtained by scanning the item; determining, based at least in part on the identifier of the item, information about the item; generating a semantic representation of the voice data using at least one of a speech recognition model or a natural language understanding model; identifying a reference to the item in the semantic representation; identifying a user intent in the semantic representation; and performing an action that is determined based at least in part on the information about the item, the reference to the item in the semantic representation, and the user intent in the semantic representation. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification