Modification of visual content to facilitate improved speech recognition

US 9,583,105 B2
Filed: 06/06/2014
Issued: 02/28/2017
Est. Priority Date: 06/06/2014
Status: Active Grant

First Claim

Patent Images

1. A method executed by a computing device, the method comprising:

receiving visual content for presentment on a display;

prior to causing the visual content to be presented on the display, modifying the visual content to generate new visual content based upon;

the computing device supporting automatic speech recognition (ASR); and

the computing devices supporting visual attention monitoring; and

responsive to modifying the visual content, causing the new visual content to be presented on the display;

estimating that a viewer is viewing an element in the new visual content; and

responsive to estimating that the viewer is viewing the element in the new visual content, assigning a visual indicator to the element in the new visual content.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Technologies described herein relate to modifying visual content for presentment on a display to facilitate improving performance of an automatic speech recognition (ASR) system. The visual content is modified to move elements further away from one another, wherein the moved elements give rise to ambiguity from the perspective of the ASR system. The visual content is modified to take into consideration accuracy of gaze tracking. When a user views an element in the modified visual content, the ASR system is customized as a function of the element being viewed by the user.

14 Citations

20 Claims

1. A method executed by a computing device, the method comprising:
- receiving visual content for presentment on a display;
  
  prior to causing the visual content to be presented on the display, modifying the visual content to generate new visual content based upon;
  
  the computing device supporting automatic speech recognition (ASR); and
  
  the computing devices supporting visual attention monitoring; and
  
  responsive to modifying the visual content, causing the new visual content to be presented on the display;
  
  estimating that a viewer is viewing an element in the new visual content; and
  
  responsive to estimating that the viewer is viewing the element in the new visual content, assigning a visual indicator to the element in the new visual content.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The method of claim 1, the visual content has a first layout, and wherein modifying the visual content to generate the new visual content comprises transforming the first layout to a second layout.
  - 3. The method of claim 2, the first layout includes the element and a second element with a first distance there between, and wherein modifying the visual content to generate the new visual content comprises altering distance between the element and the second element such that in the second layout a second distance separates the element from the second element.
  - 4. The method of claim 3, wherein the element comprises a first word or word sequence, the second element comprises a second word or word sequence, the method further comprising:
    - computing a value that is indicative of acoustic similarity between the first word or word sequence and the second word or word sequence; and
      
      modifying the visual content to generate the modified visual content based upon the value that is indicative of the acoustic similarity between the first word or word sequence and the second word or word sequence.
  - 5. The method of claim 1, the visual content has a first zoom level, and wherein modifying the visual content to generate the new visual content comprises altering the first zoom level to a second zoom level.
  - 6. The method of claim 1, further comprising:
    - customizing an ASR system based upon the element being estimated as being viewed by the viewer.
  - 7. The method of claim 6, further comprising:
    - receiving a signal from a microphone, the signal representative of a spoken utterance; and
      
      responsive to customizing the ASR system, recognizing the spoken utterance.
  - 8. The method of claim 1, further comprising:
    - subsequent to assigning the visual indicator to the element in the new visual content, estimating that the viewer is viewing a second element in the new visual content; and
      
      responsive to estimating that the viewer is viewing the second element, assigning the visual indicator to the second element and removing the visual indicator from the element.
  - 9. The method of claim 8, wherein the visual indicator is a highlight.
  - 10. The method of claim 9, wherein the element is a form-fillable field.
  - 11. The method of claim 1, the visual content comprises a first form-fillable field and a second form-fillable field, and modifying the visual content to generate the new visual content comprises repositioning at least one of the first form-fillable field or the second form-fillable field such that the first form-fillable field is positioned further apart from the second form-fillable field.

12. A computing device comprising:
- at least one processor; and
  
  memory that stores instructions that, when executed by the at least processor, cause the at least one processor to perform acts comprising;
  
  receiving visual content that is to be presented on a display, the visual content has a first layout, wherein the first layout includes a first element and a second element that are at first positions relative to one another, and wherein the second layout includes the first element and the second element at second positions relative to one another;
  
  prior to the visual content being presented on the display, modifying the visual content such that the visual content, when modified, has a second layout that is different from the first layout, the visual content is modified based upon;
  
  visual attention being tracked relative to the display; and
  
  a value that is indicative of acoustic similarity between the first element and the second element; and
  
  rendering the visual content with the second layout for presentment on the display.
- View Dependent Claims (13, 14, 15, 16, 17)
- - 13. The computing device of claim 12, the acts further comprising:
    - receiving images from a camera, the images capture a user viewing the display;
      
      identifying a gaze direction of the user based upon the images;
      
      estimating that the first element is being viewed by the user based upon the gaze direction; and
      
      causing graphical data to be presented on the display that indicates that the first element is estimated as being viewed by the user.
  - 14. The computing device of claim 13, the first element is a form-fillable field, and the graphical data is a highlighting of the form-fillable field.
  - 15. The computing device of claim 12, the acts further comprising:
    - receiving images from a camera, the images capture a user viewing the display;
      
      identifying a gaze direction of the user based upon the images;
      
      estimating that the first element is being viewed by the user based upon the gaze direction;
      
      receiving an audio signal, the audio signal includes a spoken utterance set forth by the user; and
      
      recognizing, by an automatic speech recognition (ASR) system, the spoken utterance in the audio signal based upon the first element estimated as being viewed by the user.
  - 16. The computing device of claim 15, the acts further comprising customizing the ASR system based upon the first element estimated as being viewed by the user.
  - 17. The computing device of claim 12, the visual content included in a web page that is to be displayed on the display.

18. A computer-readable storage medium comprising instructions that, when executed by a processor, cause the processor to perform acts comprising:
- receiving a page for presentment on a display, the page comprises a first visual element and a second visual element at a first distance from one another;
  
  modifying the page to generate a modified page, wherein modifying the page comprises altering a position of at least one of the first visual element or the second visual element such that the first visual element and the second visual element are at a second distance from one another in the modified page that is different from the first distance, and further wherein modifying of the page is based upon similarity of pronunciation between at least one word corresponding to the first visual element and at least one word corresponding to the second visual element; and
  
  causing the modified page to be displayed on the display.
- View Dependent Claims (19, 20)
- - 19. The computer-readable storage medium of claim 18, the acts further comprising:
    - estimating that the first visual element is being viewed by a viewer; and
      
      modifying an automatic speech recognition (ASR) system responsive to estimating that the first visual element is being viewed by the viewer.
  - 20. The computer-readable storage medium of claim 18, the acts further comprising:
    - estimating that the first visual element is being viewed by a viewer; and
      
      highlighting the first visual element responsive to estimating that the first visual element is being viewed by the viewer.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Inventors
Stolcke, Andreas, Zweig, Geoffrey, Slaney, Malcolm
Primary Examiner(s)
Augustin, Marcellus

Application Number

US14/297,742
Publication Number

US 20150356971A1
Time in Patent Office

998 Days
Field of Search

704/231, 704237-239
US Class Current

1/1
CPC Class Codes

G06F 40/106   Display of layout of docume...

G06F 40/174   Form filling; Merging

G10L 15/24   Speech recognition using no...

H04N 7/183   for receiving images from a...

Modification of visual content to facilitate improved speech recognition

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

14 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Modification of visual content to facilitate improved speech recognition

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

14 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links