Natural human-computer interaction for virtual personal assistant systems

US 9,607,612 B2
Filed: 05/20/2013
Issued: 03/28/2017
Est. Priority Date: 05/20/2013
Status: Active Grant

First Claim

Patent Images

1. A computing device for speech recognition, the computing device comprising:

a processor;

an audio sensor;

an audio input module to;

capture audio input using the audio sensor; and

distort, by the processor, a waveform of the audio input to produce a plurality of distorted audio variations, wherein to distort the waveform comprises to adjust a temporal duration of the waveform; and

a speech recognition module to;

perform speech recognition on the audio input and each of the distorted audio variations to produce a plurality of speech recognition results; and

select, by the processor, a result from the speech recognition results based on contextual information.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Technologies for natural language interactions with virtual personal assistant systems include a computing device configured to capture audio input, distort the audio input to produce a number of distorted audio variations, and perform speech recognition on the audio input and the distorted audio variants. The computing device selects a result from a large number of potential speech recognition results based on contextual information. The computing device may measure a user'"'"'s engagement level by using an eye tracking sensor to determine whether the user is visually focused on an avatar rendered by the virtual personal assistant. The avatar may be rendered in a disengaged state, a ready state, or an engaged state based on the user engagement level. The avatar may be rendered as semitransparent in the disengaged state, and the transparency may be reduced in the ready state or the engaged state. Other embodiments are described and claimed.

Citations

18 Claims

1. A computing device for speech recognition, the computing device comprising:
- a processor;
  
  an audio sensor;
  
  an audio input module to;
  
  capture audio input using the audio sensor; and
  
  distort, by the processor, a waveform of the audio input to produce a plurality of distorted audio variations, wherein to distort the waveform comprises to adjust a temporal duration of the waveform; and
  
  a speech recognition module to;
  
  perform speech recognition on the audio input and each of the distorted audio variations to produce a plurality of speech recognition results; and
  
  select, by the processor, a result from the speech recognition results based on contextual information.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The computing device of claim 1, wherein to adjust the temporal duration of the waveform comprises at least one of to:
    - (i) remove an internal segment of the audio input having an amplitude with a predefined relationship to an amplitude threshold, or (ii) expand a length of a segment of the audio input having an amplitude with a predefined relationship to an amplitude threshold.
  - 3. The computing device of claim 1, wherein to adjust the temporal duration of the waveform comprises to insert a pause at a phonetic split point of the audio input identified by performing speech recognition on the audio input.
  - 4. The computing device of claim 1, wherein to distort the audio input further comprises at least one of to:
    - (i) adjust a pitch of the audio input or (ii) introduce noise to the audio input, and wherein to adjust the temporal duration of the waveform comprises at least one of to;
      
      (i) speed up the audio input or (ii) slow down the audio input.
  - 5. The computing device of claim 1, further comprising one or more applications having a speech recognition grammar;
    - wherein the speech recognition module is further to determine semantically relevant results of the speech recognition results based on the speech recognition grammar of the one or more applications; and
      
      wherein to select the result from the speech recognition results comprises to select a result from the semantically relevant results.
  - 6. The computing device of claim 5, wherein the one or more applications comprise a virtual personal assistant.

7. A method for speech recognition on a computing device, the method comprising:
- capturing audio input using an audio sensor of the computing device;
  
  distorting a waveform of the audio input to produce a plurality of distorted audio variations, wherein distorting the waveform comprises adjusting a temporal duration of the waveform;
  
  performing speech recognition on the audio input and each of the distorted audio variations to produce a plurality of speech recognition results; and
  
  selecting a result from the speech recognition results based on contextual information.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. The method of claim 7, wherein adjusting the temporal duration of the waveform comprises at least one of:
    - (i) removing an internal segment of the audio input having an amplitude with a predefined relationship to an amplitude threshold, or (ii) expanding a length of a segment of the audio input having an amplitude with a predefined relationship to an amplitude threshold.
  - 9. The method of claim 7, wherein adjusting the temporal duration of the waveform comprises inserting a pause at a phonetic split point of the audio input identified by performing speech recognition on the audio input.
  - 10. The method of claim 7, wherein distorting the audio input further comprises performing at least one of:
    - (i) adjusting a pitch of the audio input or (ii) introducing noise to the audio input, and wherein adjusting the temporal duration of the waveform comprises at least one of;
      
      (i) speeding up the audio input or (ii) slowing down the audio input.
  - 11. The method of claim 7, further comprising determining semantically relevant results of the speech recognition results based on a speech recognition grammar of one or more applications of the computing device;
    - wherein selecting the result from the speech recognition results comprises selecting a result from the semantically relevant results.
  - 12. The method of claim 11, wherein determining the semantically relevant results based on the speech recognition grammar of the one or more applications comprises determining the semantically relevant results based on a speech recognition grammar of a virtual personal assistant of the computing device.

13. One or more non-transitory machine readable storage media comprising a plurality of instructions that in response to being executed cause a computing device to:
- capture audio input using an audio sensor of the computing device;
  
  distort a waveform of the audio input to produce a plurality of distorted audio variations, wherein to distort the waveform comprises to adjust a temporal duration of the waveform;
  
  perform speech recognition on the audio input and each of the distorted audio variations to produce a plurality of speech recognition results; and
  
  select a result from the speech recognition results based on contextual information.
- View Dependent Claims (14, 15, 16, 17, 18)
- - 14. The non-transitory machine readable media of claim 13, wherein to adjust the temporal duration of the waveform comprises at least one of to:
    - (i) remove an internal segment of the audio input having an amplitude with a predefined relationship to an amplitude threshold, or (ii) expand a length of a segment of the audio input having an amplitude with a predefined relationship to an amplitude threshold.
  - 15. The non-transitory machine readable media of claim 13, wherein to adjust the temporal duration of the waveform comprises to insert a pause at a phonetic split point of the audio input identified by performing speech recognition on the audio input.
  - 16. The non-transitory machine readable media of claim 13, wherein to distort the audio input further comprises at least one of to:
    - (i) adjust a pitch of the audio input or (ii) introduce noise to the audio input, and wherein to adjust the temporal duration of the waveform comprises at least one of to;
      
      (i) speed up the audio input or (ii) slow down the audio input.
  - 17. The non-transitory machine readable media of claim 13, further comprising a plurality of instructions that in response to being executed cause the computing device to determine semantically relevant results of the speech recognition results based on a speech recognition grammar of one or more applications of the computing device;
    - wherein to select the result from the speech recognition results comprises to select a result from the semantically relevant results.
  - 18. The non-transitory machine readable media of claim 13, wherein to determine the semantically relevant results based on the speech recognition grammar of the one or more applications comprises to determine the semantically relevant results based on a speech recognition grammar of a virtual personal assistant of the computing device.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Intel Corporation
Original Assignee
Intel Corporation
Inventors
Deleeuw, William C.
Primary Examiner(s)
AZAD, ABUL K

Application Number

US14/129,435
Publication Number

US 20160063989A1
Time in Patent Office

1,408 Days
Field of Search

704200-278
US Class Current

1/1
CPC Class Codes

G06F 3/013   Eye tracking input arrangem...

G06F 3/167   Audio in a user interface, ...

G06T 13/80   2D [Two Dimensional] animat...

G06V 10/143   Sensing or illuminating at ...

G06V 40/19   Sensors therefor

G10L 15/02   Feature extraction for spee...

G10L 15/22   Procedures used during a sp...

G10L 15/30   Distributed recognition, e....

G10L 2015/025   Phonemes, fenemes or fenone...

G10L 2015/227   of the speaker; Human-fact...

G10L 2015/228   of application context

G10L 21/003   Changing voice quality, e.g...

Natural human-computer interaction for virtual personal assistant systems

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Natural human-computer interaction for virtual personal assistant systems

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links