Multimodal speech recognition for real-time video audio-based display indicia application
First Claim
1. A computer implemented method to automatically generate audio-based display indicia of media content, the method comprising:
- defining, by a processor, a plurality of media content categories for media content by at least applying a non-supervised clustering algorithm based at least in part on a set of features extracted from the media content, the set of features comprising metadata features that are extracted from one or more video descriptions of the media content, image features extracted from one or more images of the media content, drawing and reality features extracted from one or more images of the media content, and recognized characteristics of characters of the media content;
receiving, by the processor, a plurality of categorized speech recognition algorithms, each speech recognition algorithm being associated with a respective one or more of the plurality of media content categories;
determining a media content category of a current media content from the plurality of media content categories based at least in part on a set of current features extracted from one or more video descriptions of the current media content, image features extracted from the current media content, drawings and reality features extracted from the current media content, and recognized characteristics of characters of the current media content;
selecting one speech recognition algorithm from the plurality of categorized speech recognition algorithms based on the determination of the media content category of the current media content; and
applying the selected speech recognition algorithm to the current media content.
1 Assignment
0 Petitions
Accused Products
Abstract
Aspects relate to computer implemented methods, systems, and processes to automatically generate audio-based display indicia of media content including receiving, by a processor, a plurality of media content categories including at least one feature, receiving a plurality of categorized speech recognition algorithms, each speech recognition algorithm being associated with a respective one or more of the plurality of media content categories, determining a media content category of a current media content based on at least one feature of the current media content, selecting one speech recognition algorithm from the plurality of categorized speech recognition algorithms based on the determination of the media content category of the current media content, and applying the selected speech recognition algorithm to the current media content.
42 Citations
20 Claims
-
1. A computer implemented method to automatically generate audio-based display indicia of media content, the method comprising:
-
defining, by a processor, a plurality of media content categories for media content by at least applying a non-supervised clustering algorithm based at least in part on a set of features extracted from the media content, the set of features comprising metadata features that are extracted from one or more video descriptions of the media content, image features extracted from one or more images of the media content, drawing and reality features extracted from one or more images of the media content, and recognized characteristics of characters of the media content; receiving, by the processor, a plurality of categorized speech recognition algorithms, each speech recognition algorithm being associated with a respective one or more of the plurality of media content categories; determining a media content category of a current media content from the plurality of media content categories based at least in part on a set of current features extracted from one or more video descriptions of the current media content, image features extracted from the current media content, drawings and reality features extracted from the current media content, and recognized characteristics of characters of the current media content; selecting one speech recognition algorithm from the plurality of categorized speech recognition algorithms based on the determination of the media content category of the current media content; and applying the selected speech recognition algorithm to the current media content. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system to automatically generate audio-based display indicia of media content comprising:
-
a memory having computer readable instructions; and a processor configured to execute the computer readable instructions, the computer readable instructions comprising; defining, by the processor, a plurality of media content categories for media content by at least applying a non-supervised clustering algorithm based at least in part on a set of features extracted from the media content, the set of features comprising metadata features that are extracted from one or more video descriptions of the media content, image features extracted from one or more images of the media content, drawing and reality features extracted from one or more images of the media content, and recognized characteristics of characters of the media content; receiving, by the processor, a plurality of categorized speech recognition algorithms, each speech recognition algorithm being associated with a respective one or more of the plurality of media content categories; determining, by the processor, a media content category of a current media content from the plurality of media content categories based at least in part on a set of current features extracted from the current media content, the set of current features comprising metadata features extracted from one or more video descriptions of the current media content, image features extracted from the current media content, drawing and reality features extracted from the current media content, and recognized characteristics of characters of the current media content; selecting, by the processor, one speech recognition algorithm from the plurality of categorized speech recognition algorithms based on the determination of the media content category of the current media content; and applying, by the processor the selected speech recognition algorithm to the current media content. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A computer program product to automatically generate audio-based display indicia of media content, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, wherein the computer readable storage medium is not a transitory signal per se, the program instructions executable by a processor to cause the processor to:
-
define, by the processor, a plurality of media content categories for media content by at least applying a non-supervised clustering algorithm based at least in part on a set of features extracted from the media content, the set of features comprising metadata features that are extracted from one or more video descriptions of the media content, image features extracted from one or more images of the media content, drawing and reality features extracted from one or more images of the media content, and recognized characteristics of characters of the media content; receive, by the processor, a plurality of categorized speech recognition algorithms, each speech recognition algorithm being associated with a respective one or more of the plurality of media content categories; determine, by the processor, a media content category of a current media content from the plurality of media content categories based at least in part on a set of current features extracted from the current media content, the set of current features comprising metadata features extracted from one or more video descriptions of the current media content, image features extracted from the current media content, drawing and reality features extracted from the current media content, and recognized characteristics of characters of the current media content; select, by the processor, one speech recognition algorithm from the plurality of categorized speech recognition algorithms based on the determination of the media content category of the current media content; and apply, by the processor, the selected speech recognition algorithm to the current media content. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification