Computerized system and method for automatically detecting and rendering highlights from streaming videos
First Claim
1. A method comprising steps of:
- receiving, at a computing device, a video stream comprising a plurality of frames of content, said video stream comprising a broadcast of a currently occurring live event over a network;
analyzing via the computing device via first image recognition software, as the stream is being received, a set of frames from the plurality of frames of the video stream, said analysis comprising performing, via the first image recognition software, transformations of the frame content within the frame set resulting in an identification of attributes of the frame content;
training, by the computing device, a deep learning algorithm, the training comprising;
determining and storing labels for frames of a first set of training videos based on attributes of the frames, a label comprising an indication as to a scene type depicted by content of the frames of the training videos and the label associated with said predetermined set of machine learned attributes,automatically applying the labels via the first image recognition software to a second set of training videos,analyzing said automatically applied labels and adjusting the automatically applied labels identified as being inaccurate, andupdating indications associated with the labels based on the analysis;
classifying, via the computing device, a scene type depicted in the frame set by determining whether the scene type is a game scene or whether the scene type is a non-game scene based on said frame content transformations, the classifying comprising applying the deep learning algorithm to frames in the frame set, the deep learning algorithm comprising multiple layers, each layer producing a feature collection for a given frame, the deep learning algorithm comprising overlapping, for each frame, the feature collections associated with each layer, pooling the overlapping feature collections, and assigning a label to the frame set based on the pooled feature collections;
determining, via the computing device, that said scene type is a game scene based on the assigned label, said game scene comprising content associated with game play occurring in said live event;
discarding, by the computing device, the frame set upon determining that the scene type is a non-game scene;
upon determining that said scene type is a game scene, determining, via the computing device, that said content within said game scene is a highlight and designating said game scene as a highlight based on said determination, said determination comprising computing a highlight score for said game scene by analyzing frames of the game scene via the second image recognition software and determining that an output from said second image recognition software satisfies a threshold, said output based on a comparison, performed by the second image recognition software, of the attributes of the frame content in the game scene against a predetermined set of machine learned highlight attributes;
generating, via the computing device, an output file corresponding to the video stream, said output file comprising time-stamped information associated with the scene label and the highlight label; and
automatically creating, via the computing device, a highlight video segment from the video stream based on said output file, said highlight video segment created from and comprising frames of the video stream identified in the output file as the game scene and highlight.
6 Assignments
0 Petitions
Accused Products
Abstract
Disclosed are systems and methods for improving interactions with and between computers in content generating, searching, hosting and/or providing systems supported by or configured with personal computing devices, servers and/or platforms. The systems interact to identify and retrieve data within or across platforms, which can be used to improve the quality of data used in processing interactions between or among processors in such systems. The disclosed systems and methods provide systems and methods for automatically detecting and rendering highlights from streaming videos in real-time. As a streaming video is being broadcast over the Internet, the disclosed systems and methods determine each type of scene from the streaming video, and automatically score highlight scenes. The scored highlight scenes are then communicated to users as compiled video segments, which can be over any type of channel or platform accessible to a user'"'"'s device and network that enables content rendering and user interaction.
33 Citations
18 Claims
-
1. A method comprising steps of:
-
receiving, at a computing device, a video stream comprising a plurality of frames of content, said video stream comprising a broadcast of a currently occurring live event over a network; analyzing via the computing device via first image recognition software, as the stream is being received, a set of frames from the plurality of frames of the video stream, said analysis comprising performing, via the first image recognition software, transformations of the frame content within the frame set resulting in an identification of attributes of the frame content; training, by the computing device, a deep learning algorithm, the training comprising; determining and storing labels for frames of a first set of training videos based on attributes of the frames, a label comprising an indication as to a scene type depicted by content of the frames of the training videos and the label associated with said predetermined set of machine learned attributes, automatically applying the labels via the first image recognition software to a second set of training videos, analyzing said automatically applied labels and adjusting the automatically applied labels identified as being inaccurate, and updating indications associated with the labels based on the analysis; classifying, via the computing device, a scene type depicted in the frame set by determining whether the scene type is a game scene or whether the scene type is a non-game scene based on said frame content transformations, the classifying comprising applying the deep learning algorithm to frames in the frame set, the deep learning algorithm comprising multiple layers, each layer producing a feature collection for a given frame, the deep learning algorithm comprising overlapping, for each frame, the feature collections associated with each layer, pooling the overlapping feature collections, and assigning a label to the frame set based on the pooled feature collections; determining, via the computing device, that said scene type is a game scene based on the assigned label, said game scene comprising content associated with game play occurring in said live event; discarding, by the computing device, the frame set upon determining that the scene type is a non-game scene; upon determining that said scene type is a game scene, determining, via the computing device, that said content within said game scene is a highlight and designating said game scene as a highlight based on said determination, said determination comprising computing a highlight score for said game scene by analyzing frames of the game scene via the second image recognition software and determining that an output from said second image recognition software satisfies a threshold, said output based on a comparison, performed by the second image recognition software, of the attributes of the frame content in the game scene against a predetermined set of machine learned highlight attributes; generating, via the computing device, an output file corresponding to the video stream, said output file comprising time-stamped information associated with the scene label and the highlight label; and automatically creating, via the computing device, a highlight video segment from the video stream based on said output file, said highlight video segment created from and comprising frames of the video stream identified in the output file as the game scene and highlight. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A non-transitory computer-readable storage medium tangibly encoded with computer-executable instructions, that when executed by a processor associated with a computing device, performs a method comprising:
-
receiving a video stream comprising a plurality of frames of content, said video stream comprising a broadcast of a currently occurring live event over a network; analyzing via first image recognition software, as the stream is being received, a set of frames from the plurality of frames of the video stream, said analysis comprising performing, via the first image recognition software, transformations of the frame content within the frame set resulting in an identification of attributes of the frame content; training a deep learning algorithm, the training comprising; determining and storing labels for frames of a first set of training videos based on attributes of the frames, a label comprising an indication as to a scene type depicted by content of the frames of the training videos and the label associated with said predetermined set of machine learned attributes, automatically applying the labels via the first image recognition software to a second set of training videos, analyzing said automatically applied labels and adjusting the automatically applied labels identified as being inaccurate, and updating indications associated with the labels based on the analysis; classifying a scene type depicted in the frame set by determining whether the scene type is a game scene or whether the scene type is a non-game scene based on said frame content transformations, the classifying comprising applying the deep learning algorithm to frames in the frame set, the deep learning algorithm comprising multiple layers, each layer producing a feature collection for a given frame, the deep learning algorithm comprising overlapping, for each frame, the feature collections associated with each layer, pooling the overlapping feature collections, and assigning a label to the frame set based on the pooled feature collections; determining that said scene type is a game scene based on the assigned label, said game scene comprising content associated with game play occurring in said live event; discarding the frame set upon determining that the scene type is a non-game scene; upon determining that said scene type is a game scene, determining that said content within said game scene is a highlight and designating said game scene as a highlight based on said determination, said determination comprising computing a highlight score for said game scene by analyzing frames of the game scene via the second image recognition software and determining that an output from said second image recognition software satisfies a threshold, said output based on a comparison, performed by the second image recognition software, of the attributes of the frame content in the game scene against a predetermined set of machine learned highlight attributes; generating, via the computing device, an output file corresponding to the video stream, said output file comprising time-stamped information associated with the scene label and the highlight label; and automatically creating a highlight video segment from the video stream based on said output file, said highlight video segment created from and comprising frames of the video stream identified in the output file as the game scene and highlight. - View Dependent Claims (14, 15, 16, 17)
-
-
18. A computing device comprising:
-
a processor; a non-transitory computer-readable storage medium for tangibly storing thereon program logic for execution by the processor, the program logic comprising; logic executed by the processor for receiving, at a computing device, a video stream comprising a plurality of frames of content, said video stream comprising a broadcast of a currently occurring live event over a network; logic executed by the processor for analyzing via the computing device via first image recognition software, as the stream is being received, a set of frames from the plurality of frames of the video stream, said analysis comprising performing, via the first image recognition software, transformations of the frame content within the frame set resulting in an identification of attributes of the frame content; logic executed by the processor for training, by the computing device, a deep learning algorithm, the training comprising; determining and storing labels for frames of a first set of training videos based on attributes of the frames, a label comprising an indication as to a scene type depicted by content of the frames of the training videos and the label associated with said predetermined set of machine learned attributes, automatically applying the labels via the first image recognition software to a second set of training videos, analyzing said automatically applied labels and adjusting the automatically applied labels identified as being inaccurate, and updating indications associated with the labels based on the analysis; logic executed by the processor for classifying, via the computing device, a scene type depicted in the frame set by determining whether the scene type is a game scene or whether the scene type is a non-game scene based on said frame content transformations, the classifying comprising applying the deep learning algorithm to frames in the frame set, the deep learning algorithm comprising multiple layers, each layer producing a feature collection for a given frame, the deep learning algorithm comprising overlapping, for each frame, the feature collections associated with each layer, pooling the overlapping feature collections, and assigning a label to the frame set based on the pooled feature collections; logic executed by the processor for determining, via the computing device, that said scene type is a game scene based on the assigned label, said game scene comprising content associated with game play occurring in said live event; logic executed by the processor for, discarding the frame set upon determining that the scene type is a non-game scene; logic executed by the processor for determining, via the computing device, upon determining that said scene type is a game scene, that said content within said game scene is a highlight and designating said game scene as a highlight based on said determination, said determination comprising computing a highlight score for said game scene by analyzing frames of the game scene via the second image recognition software and determining that an output from said second image recognition software satisfies a threshold, said output based on a comparison, performed by the second image recognition software, of the attributes of the frame content in the game scene against a predetermined set of machine learned highlight attributes; logic executed by the processor for generating, via the computing device, an output file corresponding to the video stream, said output file comprising time-stamped information associated with the scene label and the highlight label; and logic executed by the processor for automatically creating, via the computing device, a highlight video segment from the video stream based on said output file, said highlight video segment created from and comprising frames of the video stream identified in the output file as the game scene and highlight.
-
Specification