Voice-based video tagging
First Claim
1. A method for identifying an event of interest in a video, the method performed by a camera including one or more processors, the method comprising:
- accessing, by the camera, a captured speech pattern, the captured speech pattern captured from a user at a moment during capture of the video;
matching, by the camera, the captured speech pattern to a given stored speech pattern of multiple stored speech patterns, the multiple stored speech patterns corresponding to a command for identifying the event of interest within the video, individual ones of the multiple stored speech patterns stored based on a number of times the individual ones of the multiple stored speech patterns are captured by the camera from a user while the camera is operating in a training mode, wherein the individual ones of the multiple stored speech patterns correspond to an identification of the event of interest as occurring before, during, or after the moment; and
in response to matching the captured speech pattern to the given stored speech pattern, storing, by the camera, event of interest information associated with the video, the event of interest information identifying an event moment during the capture of the video at which the event of interest occurs, the event moment being determined to occur before, during, or after the moment based on the matching of the captured speech pattern to the given stored speech pattern.
4 Assignments
0 Petitions
Accused Products
Abstract
Video and corresponding metadata is accessed. Events of interest within the video are identified based on the corresponding metadata, and best scenes are identified based on the identified events of interest. A video summary can be generated including one or more of the identified best scenes. The video summary can be generated using a video summary template with slots corresponding to video clips selected from among sets of candidate video clips. Best scenes can also be identified by receiving an indication of an event of interest within video from a user during the capture of the video. Metadata patterns representing activities identified within video clips can be identified within other videos, which can subsequently be associated with the identified activities.
-
Citations
20 Claims
-
1. A method for identifying an event of interest in a video, the method performed by a camera including one or more processors, the method comprising:
-
accessing, by the camera, a captured speech pattern, the captured speech pattern captured from a user at a moment during capture of the video; matching, by the camera, the captured speech pattern to a given stored speech pattern of multiple stored speech patterns, the multiple stored speech patterns corresponding to a command for identifying the event of interest within the video, individual ones of the multiple stored speech patterns stored based on a number of times the individual ones of the multiple stored speech patterns are captured by the camera from a user while the camera is operating in a training mode, wherein the individual ones of the multiple stored speech patterns correspond to an identification of the event of interest as occurring before, during, or after the moment; and in response to matching the captured speech pattern to the given stored speech pattern, storing, by the camera, event of interest information associated with the video, the event of interest information identifying an event moment during the capture of the video at which the event of interest occurs, the event moment being determined to occur before, during, or after the moment based on the matching of the captured speech pattern to the given stored speech pattern. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system for identifying an event of interest in a video, the system comprising:
-
one or more processors configured by instructions to; access a captured speech pattern, the captured speech pattern captured from a user at a moment during capture of the video; match the captured speech pattern to a given stored speech pattern of multiple stored speech patterns, the multiple stored speech patterns corresponding to a command for identifying the event of interest within the video, individual ones of the multiple stored speech patterns stored based on a number of times the individual ones of the multiple stored speech patterns are captured by the camera from a user while the camera is operating in a training mode, wherein the individual ones of the multiple stored speech patterns correspond to an identification of the event of interest as occurring before, during, or after the moment; and in response a match of the captured speech pattern to the given stored speech pattern, store event of interest information associated with the video, the event of interest information identifying an event moment during the capture of the video at which the event of interest occurs, the event moment being determined to occur before, during, or after the moment based on the matching of the captured speech pattern to the given stored speech pattern. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A non-transitory computer-readable storage medium storing instructions for identifying an event of interest in a video, the instructions, when executed, causing one or more processors to:
-
access a captured speech pattern, the captured speech pattern captured from a user at a moment during capture of the video; match the captured speech pattern to a given stored speech pattern of multiple stored speech patterns, the multiple stored speech patterns corresponding to a command for identifying the event of interest within the video, individual ones of the multiple stored speech patterns stored based on a number of times the individual ones of the multiple stored speech patterns are captured by the camera from a user while the camera is operating in a training mode, wherein the individual ones of the multiple stored speech patterns correspond to an identification of the event of interest as occurring before, during, or after the moment; and in response a match of the captured speech pattern to the given stored speech pattern, store event of interest information associated with the video, the event of interest information identifying an event moment during the capture of the video at which the event of interest occurs, the event moment being determined to occur before, during, or after the moment based on the matching of the captured speech pattern to the given stored speech pattern. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification