Voice input correction using non-audio based input

US 10,741,182 B2
Filed: 02/18/2014
Issued: 08/11/2020
Est. Priority Date: 02/18/2014
Status: Active Grant

First Claim

Patent Images

1. A method, comprising:

accepting, at an audio receiver of an information handling device, voice input of a user and capturing, using a sensor, non-audio based input correlated with the voice input;

generating, using one or more speech recognition engines, an initial interpretation by interpreting the voice input without utilizing the non-audio based input for the initial interpretation;

identifying, using the one or more speech recognition engines, an ambiguous voice input comprising at least one ambiguity in the initial interpretation, wherein the identifying comprises identifying that at least a portion of the initial interpretation is associated with a confidence score meeting a predetermined low confidence threshold, wherein the confidence score is based in part on a condition of the user;

thereafter augmenting the one or more speech recognition engines and re-interpreting the ambiguous voice input by accessing, using the one or more speech recognition engines, based upon the confidence score meeting the predetermined low confidence threshold, stored non-audio based input matched in time with the ambiguous voice input, wherein the accessing is based upon a policy associated with a confidence level of interpretation, wherein the confidence level of interpretation is based on a device usage history, wherein the re-interpreting comprises mapping the stored non-audio based input to known features of the user while providing voice input correlated with the voice input; and

adjusting the initial interpretation of the voice input using non-audio based input, wherein the adjusting comprises changing the initial interpretation using the non-audio based input.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An embodiment provides a method, including: accepting, at an audio receiver of an information handling device, voice input of a user; interpreting, using a processor, the voice input; identifying, using a processor, at least one ambiguity in interpreting the voice input; thereafter accessing stored non-audible input associated in time with the at least one ambiguity; and adjusting an interpretation of the voice input using non-audible input. Other aspects are described and claimed.

17 Citations

View as Search Results

18 Claims

1. A method, comprising:
- accepting, at an audio receiver of an information handling device, voice input of a user and capturing, using a sensor, non-audio based input correlated with the voice input;
  
  generating, using one or more speech recognition engines, an initial interpretation by interpreting the voice input without utilizing the non-audio based input for the initial interpretation;
  
  identifying, using the one or more speech recognition engines, an ambiguous voice input comprising at least one ambiguity in the initial interpretation, wherein the identifying comprises identifying that at least a portion of the initial interpretation is associated with a confidence score meeting a predetermined low confidence threshold, wherein the confidence score is based in part on a condition of the user;
  
  thereafter augmenting the one or more speech recognition engines and re-interpreting the ambiguous voice input by accessing, using the one or more speech recognition engines, based upon the confidence score meeting the predetermined low confidence threshold, stored non-audio based input matched in time with the ambiguous voice input, wherein the accessing is based upon a policy associated with a confidence level of interpretation, wherein the confidence level of interpretation is based on a device usage history, wherein the re-interpreting comprises mapping the stored non-audio based input to known features of the user while providing voice input correlated with the voice input; and
  
  adjusting the initial interpretation of the voice input using non-audio based input, wherein the adjusting comprises changing the initial interpretation using the non-audio based input.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, wherein the adjusting comprises correcting the voice input.
  - 3. The method of claim 1, wherein the sensor is a camera.
  - 4. The method of claim 1, wherein said capturing proceeds according to the policy.
  - 5. The method of claim 1, wherein a factor comprises at least one factor selected from the group consisting of:
    - history of low speech recognition confidence, detection of multiple speech candidates, a detection of background noise exceeding a predetermined threshold, detection of a repeated word, and detection of an atypical voice characteristic.
  - 6. The method of claim 4, wherein said policy adjusts said capturing responsive to battery level falling below a predetermined threshold.
  - 7. The method of claim 1, wherein the accessing stored non-audio based input associated in time with the at least one ambiguity comprises accessing non-audible input derived from data selected from the group consisting of visible light image data, non-visible electromagnetic radiation image data, and non-audible sound data.
  - 8. The method of claim 1, wherein the identifying at least one ambiguity in interpreting the voice input comprises identifying a word including a predetermined sound characteristic associated with ambiguity.
  - 9. The method of claim 8, wherein the predetermined sound characteristic associated with ambiguity is a consonant sound.

10. An information handling device, comprising:
- an audio receiver;
  
  a sensor that captures input;
  
  one or more processors; and
  
  a memory storing instructions that are executed by processor to;
  
  accept, at the audio receiver, voice input of a user and capture, using the sensor, non-audio based input correlated with the voice input;
  
  generate, using a speech recognition engine, an initial interpretation by interpreting the voice input without utilizing the non-audio based input for the initial interpretation;
  
  identify an ambiguous voice input comprising at least one ambiguity in the initial interpretation, wherein the identifying comprises identifying that at least a portion of the initial interpretation is associated with a confidence score meeting a predetermined low confidence threshold, wherein the confidence score is based in part on a condition of the user;
  
  thereafter augmenting the speech recognition engine and re-interpreting the ambiguous voice input by accessing, using the one or more processors, based upon the confidence score meeting the predetermined low confidence threshold, stored non-audio based input matched in time with the ambiguous voice input, wherein the accessing is based upon a policy associated with a confidence level of interpretation, wherein the confidence level of interpretation is based on a device usage history, wherein the re-interpreting comprises mapping the stored non-audio based input to known features of the user while providing voice input correlated with the voice input; and
  
  adjust the initial interpretation of the voice input using non-audio based input derived from the sensor, wherein to adjust comprises to change the initial interpretation using the non-audio based input.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
- - 11. The information handling device of claim 10, wherein to adjust comprises correcting the voice input.
  - 12. The information handling device of claim 10, wherein the sensor is a camera.
  - 13. The information handling device of claim 10, wherein to capture comprises capturing non-audio based input according to the policy.
  - 14. The information handling device of claim 10, a factor comprises at least one factor selected from the group consisting of:
    - history of low speech recognition confidence, detection of multiple speech candidates, a detection of background noise exceeding a predetermined threshold, detection of a repeated word, and detection of an atypical voice characteristic.
  - 15. The information handling device of claim 13, wherein said policy adjusts said capturing responsive to battery level falling below a predetermined threshold.
  - 16. The information handling device of claim 10, wherein to access stored non-audio based input associated in time with the at least one ambiguity comprises accessing non-audio based input derived from data selected from the group consisting of visible light image data, non-visible electromagnetic radiation image data, and non-audible sound image data.
  - 17. The information handling device of claim 10, wherein the identifying at least one ambiguity in interpreting the voice input comprises identifying a word including a predetermined sound characteristic associated with ambiguity.

18. A product, comprising:
- a storage medium having device readable code stored therewith, the device readable code being executable by a processor and comprising;
  
  code that accepts voice input of a user and code that captures non-audio based input correlated with the voice input;
  
  code that generates, using a speech recognition engine, an initial interpretation by interpreting the voice without utilizing the non-audio based input for the initial interpretation;
  
  code that identifies an ambiguous voice input comprising at least one ambiguity in the initial interpretation, wherein the identifying comprises identifying that at least a portion of the initial interpretation is associated with a confidence score meeting a predetermined low confidence threshold, wherein the confidence score is based in part on a condition of the user;
  
  code that thereafter augmenting the speech recognition engine and re-interpreting the ambiguous voice input by accessing, based upon the confidence score meeting the predetermined low confidence threshold, stored non-audio based input matched in time with the ambiguous voice input, wherein the accessing is based upon a policy associated with a confidence level of interpretation, wherein the confidence level of interpretation is based on a device usage history, wherein the re-interpreting comprises mapping the stored non-audio based input to known features of the user while providing voice input correlated with the voice input; and
  
  code that adjusts the initial interpretation of the voice input using non-audio based input, wherein the code that adjusts comprises code that changes the initial interpretation using the non-audio based input.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Lenovo PC International Limited (Lenovo Group Ltd.)
Original Assignee
Lenovo Singapore Pte Limited (Lenovo Group Ltd.)
Inventors
VanBlon, Russell Speight, Waltermann, Rod D., Beaumont, Suzanne Marion
Primary Examiner(s)
Shah, Bharatkumar S

Application Number

US14/182,875
Publication Number

US 20150235641A1
Time in Patent Office

2,366 Days
Field of Search

704235
US Class Current
CPC Class Codes

G06F 40/00   Handling natural language d...

G10L 15/24   Speech recognition using no...

G10L 15/25   using position of the lips,...

G10L 15/26   Speech to text systems G10L...

Voice input correction using non-audio based input

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

17 Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Voice input correction using non-audio based input

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

17 Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links