Modification of visual content to facilitate improved speech recognition
First Claim
Patent Images
1. A method executed by a computing device, the method comprising:
- receiving visual content for presentment on a display;
prior to causing the visual content to be presented on the display, modifying the visual content to generate new visual content based upon;
the computing device supporting automatic speech recognition (ASR); and
the computing devices supporting visual attention monitoring; and
responsive to modifying the visual content, causing the new visual content to be presented on the display;
estimating that a viewer is viewing an element in the new visual content; and
responsive to estimating that the viewer is viewing the element in the new visual content, assigning a visual indicator to the element in the new visual content.
2 Assignments
0 Petitions
Accused Products
Abstract
Technologies described herein relate to modifying visual content for presentment on a display to facilitate improving performance of an automatic speech recognition (ASR) system. The visual content is modified to move elements further away from one another, wherein the moved elements give rise to ambiguity from the perspective of the ASR system. The visual content is modified to take into consideration accuracy of gaze tracking. When a user views an element in the modified visual content, the ASR system is customized as a function of the element being viewed by the user.
14 Citations
20 Claims
-
1. A method executed by a computing device, the method comprising:
-
receiving visual content for presentment on a display; prior to causing the visual content to be presented on the display, modifying the visual content to generate new visual content based upon; the computing device supporting automatic speech recognition (ASR); and the computing devices supporting visual attention monitoring; and responsive to modifying the visual content, causing the new visual content to be presented on the display; estimating that a viewer is viewing an element in the new visual content; and responsive to estimating that the viewer is viewing the element in the new visual content, assigning a visual indicator to the element in the new visual content. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A computing device comprising:
-
at least one processor; and memory that stores instructions that, when executed by the at least processor, cause the at least one processor to perform acts comprising; receiving visual content that is to be presented on a display, the visual content has a first layout, wherein the first layout includes a first element and a second element that are at first positions relative to one another, and wherein the second layout includes the first element and the second element at second positions relative to one another; prior to the visual content being presented on the display, modifying the visual content such that the visual content, when modified, has a second layout that is different from the first layout, the visual content is modified based upon; visual attention being tracked relative to the display; and a value that is indicative of acoustic similarity between the first element and the second element; and rendering the visual content with the second layout for presentment on the display. - View Dependent Claims (13, 14, 15, 16, 17)
-
-
18. A computer-readable storage medium comprising instructions that, when executed by a processor, cause the processor to perform acts comprising:
-
receiving a page for presentment on a display, the page comprises a first visual element and a second visual element at a first distance from one another; modifying the page to generate a modified page, wherein modifying the page comprises altering a position of at least one of the first visual element or the second visual element such that the first visual element and the second visual element are at a second distance from one another in the modified page that is different from the first distance, and further wherein modifying of the page is based upon similarity of pronunciation between at least one word corresponding to the first visual element and at least one word corresponding to the second visual element; and causing the modified page to be displayed on the display. - View Dependent Claims (19, 20)
-
Specification