Natural human-computer interaction for virtual personal assistant systems
First Claim
1. A computing device for speech recognition, the computing device comprising:
- a processor;
an audio sensor;
an audio input module to;
capture audio input using the audio sensor; and
distort, by the processor, a waveform of the audio input to produce a plurality of distorted audio variations, wherein to distort the waveform comprises to adjust a temporal duration of the waveform; and
a speech recognition module to;
perform speech recognition on the audio input and each of the distorted audio variations to produce a plurality of speech recognition results; and
select, by the processor, a result from the speech recognition results based on contextual information.
1 Assignment
0 Petitions
Accused Products
Abstract
Technologies for natural language interactions with virtual personal assistant systems include a computing device configured to capture audio input, distort the audio input to produce a number of distorted audio variations, and perform speech recognition on the audio input and the distorted audio variants. The computing device selects a result from a large number of potential speech recognition results based on contextual information. The computing device may measure a user'"'"'s engagement level by using an eye tracking sensor to determine whether the user is visually focused on an avatar rendered by the virtual personal assistant. The avatar may be rendered in a disengaged state, a ready state, or an engaged state based on the user engagement level. The avatar may be rendered as semitransparent in the disengaged state, and the transparency may be reduced in the ready state or the engaged state. Other embodiments are described and claimed.
-
Citations
18 Claims
-
1. A computing device for speech recognition, the computing device comprising:
-
a processor; an audio sensor; an audio input module to; capture audio input using the audio sensor; and distort, by the processor, a waveform of the audio input to produce a plurality of distorted audio variations, wherein to distort the waveform comprises to adjust a temporal duration of the waveform; and a speech recognition module to; perform speech recognition on the audio input and each of the distorted audio variations to produce a plurality of speech recognition results; and select, by the processor, a result from the speech recognition results based on contextual information. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method for speech recognition on a computing device, the method comprising:
-
capturing audio input using an audio sensor of the computing device; distorting a waveform of the audio input to produce a plurality of distorted audio variations, wherein distorting the waveform comprises adjusting a temporal duration of the waveform; performing speech recognition on the audio input and each of the distorted audio variations to produce a plurality of speech recognition results; and selecting a result from the speech recognition results based on contextual information. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. One or more non-transitory machine readable storage media comprising a plurality of instructions that in response to being executed cause a computing device to:
-
capture audio input using an audio sensor of the computing device; distort a waveform of the audio input to produce a plurality of distorted audio variations, wherein to distort the waveform comprises to adjust a temporal duration of the waveform; perform speech recognition on the audio input and each of the distorted audio variations to produce a plurality of speech recognition results; and select a result from the speech recognition results based on contextual information. - View Dependent Claims (14, 15, 16, 17, 18)
-
Specification