Voice-based video tagging

US 10,339,975 B2
Filed: 06/19/2017
Issued: 07/02/2019
Est. Priority Date: 07/23/2014
Status: Active Grant

First Claim

Patent Images

1. A method for identifying an event of interest in a video, the method performed by a camera including one or more processors, the method comprising:

accessing, by the camera, a captured speech pattern, the captured speech pattern captured from a user at a moment during capture of the video;

matching, by the camera, the captured speech pattern to a given stored speech pattern of multiple stored speech patterns, the multiple stored speech patterns corresponding to a command for identifying the event of interest within the video, individual ones of the multiple stored speech patterns stored based on a number of times the individual ones of the multiple stored speech patterns are captured by the camera from a user while the camera is operating in a training mode, wherein the individual ones of the multiple stored speech patterns correspond to an identification of the event of interest as occurring before, during, or after the moment; and

in response to matching the captured speech pattern to the given stored speech pattern, storing, by the camera, event of interest information associated with the video, the event of interest information identifying an event moment during the capture of the video at which the event of interest occurs, the event moment being determined to occur before, during, or after the moment based on the matching of the captured speech pattern to the given stored speech pattern.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Video and corresponding metadata is accessed. Events of interest within the video are identified based on the corresponding metadata, and best scenes are identified based on the identified events of interest. A video summary can be generated including one or more of the identified best scenes. The video summary can be generated using a video summary template with slots corresponding to video clips selected from among sets of candidate video clips. Best scenes can also be identified by receiving an indication of an event of interest within video from a user during the capture of the video. Metadata patterns representing activities identified within video clips can be identified within other videos, which can subsequently be associated with the identified activities.

Citations

20 Claims

1. A method for identifying an event of interest in a video, the method performed by a camera including one or more processors, the method comprising:
- accessing, by the camera, a captured speech pattern, the captured speech pattern captured from a user at a moment during capture of the video;
  
  matching, by the camera, the captured speech pattern to a given stored speech pattern of multiple stored speech patterns, the multiple stored speech patterns corresponding to a command for identifying the event of interest within the video, individual ones of the multiple stored speech patterns stored based on a number of times the individual ones of the multiple stored speech patterns are captured by the camera from a user while the camera is operating in a training mode, wherein the individual ones of the multiple stored speech patterns correspond to an identification of the event of interest as occurring before, during, or after the moment; and
  
  in response to matching the captured speech pattern to the given stored speech pattern, storing, by the camera, event of interest information associated with the video, the event of interest information identifying an event moment during the capture of the video at which the event of interest occurs, the event moment being determined to occur before, during, or after the moment based on the matching of the captured speech pattern to the given stored speech pattern.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, further comprising:
    - identifying a portion of the video as a video clip associated with the event of interest based on the event of interest information, the video clip comprising a first amount of the video occurring before the event moment and a second amount of the video occurring after the event moment, the first amount and the secondstoring clip information indicating the association of the video clip with the event of interest and the portion of the video included in the video clip.
  - 3. The method of claim 2, further comprising:
    - receiving a request to generate a video summary; and
      
      generating the video summary in response to the request, the video summary comprising the video clip associated with the event of interest.
  - 4. The method of claim 1, wherein the event of interest corresponds to an activity type, wherein the given stored speech pattern identifies the activity type, and wherein storing the event of interest information comprises storing an indication of the activity type in metadata associated with the video.
  - 5. The method of claim 1, wherein the given stored speech pattern is specific to the user.
  - 6. The method of claim 1, wherein the given stored speech pattern includes a speech pattern of a spoken word.
  - 7. The method of claim 1, wherein the given stored speech pattern includes a speech pattern of a spoken phrase.

8. A system for identifying an event of interest in a video, the system comprising:
- one or more processors configured by instructions to;
  
  access a captured speech pattern, the captured speech pattern captured from a user at a moment during capture of the video;
  
  match the captured speech pattern to a given stored speech pattern of multiple stored speech patterns, the multiple stored speech patterns corresponding to a command for identifying the event of interest within the video, individual ones of the multiple stored speech patterns stored based on a number of times the individual ones of the multiple stored speech patterns are captured by the camera from a user while the camera is operating in a training mode, wherein the individual ones of the multiple stored speech patterns correspond to an identification of the event of interest as occurring before, during, or after the moment; and
  
  in response a match of the captured speech pattern to the given stored speech pattern, store event of interest information associated with the video, the event of interest information identifying an event moment during the capture of the video at which the event of interest occurs, the event moment being determined to occur before, during, or after the moment based on the matching of the captured speech pattern to the given stored speech pattern.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The system of claim 8, wherein the one or more processors are further configured to:
    - identify a portion of the video as a video clip associated with the event of interest based on the event of interest information, the video clip comprising a first amount of the video occurring before the event moment and a second amount of the video occurring after the event moment, the first amount and the second amount being determined based on the matching of the captured speech pattern to the given stored speech pattern; and
      
      store clip information indicating the association of the video clip with the event of interest and the portion of the video included in the video clip.
  - 10. The system of claim 9, wherein the one or more processors are further configured to:
    - receive a request to generate a video summary; and
      
      generate the video summary in response to the request, the video summary comprising the video clip associated with the event of interest.
  - 11. The system of claim 8, wherein the event of interest corresponds to an activity type, wherein the given stored speech pattern identifies the activity type, and wherein the event of interest information is stored such that an indication of the activity type is stored in metadata associated with the video.
  - 12. The system of claim 8, wherein the given stored speech pattern is specific to the user.
  - 13. The system of claim 8, wherein the given stored speech pattern includes a speech pattern of a spoken word.
  - 14. The system of claim 8, wherein the given stored speech pattern includes a speech pattern of a spoken phrase.

15. A non-transitory computer-readable storage medium storing instructions for identifying an event of interest in a video, the instructions, when executed, causing one or more processors to:
- access a captured speech pattern, the captured speech pattern captured from a user at a moment during capture of the video;
  
  match the captured speech pattern to a given stored speech pattern of multiple stored speech patterns, the multiple stored speech patterns corresponding to a command for identifying the event of interest within the video, individual ones of the multiple stored speech patterns stored based on a number of times the individual ones of the multiple stored speech patterns are captured by the camera from a user while the camera is operating in a training mode, wherein the individual ones of the multiple stored speech patterns correspond to an identification of the event of interest as occurring before, during, or after the moment; and
  
  in response a match of the captured speech pattern to the given stored speech pattern, store event of interest information associated with the video, the event of interest information identifying an event moment during the capture of the video at which the event of interest occurs, the event moment being determined to occur before, during, or after the moment based on the matching of the captured speech pattern to the given stored speech pattern.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The computer-readable storage medium of claim 15, wherein the instructions, when executed, further cause the one or more processors to:
    - identify a portion of the video as a video clip associated with the event of interest based on the event of interest information, the video clip comprising a first amount of the video occurring before the event moment and a second amount of the video occurring after the event moment, the first amount and the second amount being determined based on the matching of the captured speech pattern to the given stored speech pattern; and
      
      store clip information indicating the association of the video clip with the event of interest and the portion of the video included in the video clip.
  - 17. The computer-readable storage medium of claim 16, wherein instructions, when executed, further cause the one or more processors to:
    - receive a request to generate a video summary; and
      
      generate the video summary in response to the request, the video summary comprising the video clip associated with the event of interest.
  - 18. The computer-readable storage medium of claim 15, wherein the event of interest corresponds to an activity type, wherein the given stored speech pattern identifies the activity type, and wherein the event of interest information is stored such that an indication of the activity type is stored in metadata associated with the video.
  - 19. The computer-readable storage medium of claim 15, wherein the given stored speech pattern is specific to the user.
  - 20. The computer-readable storage medium of claim 15, wherein the given stored speech pattern includes a speech pattern of a spoken word or a spoken phrase.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
GoPro, Inc.
Original Assignee
GoPro, Inc.
Inventors
Hodulik, Nick, Taylor, Jonathan, Leong, Brian, Briz, Bettina, Boghosian, Lisa, Knott, Mike
Primary Examiner(s)
Ye, Lin
Assistant Examiner(s)
Nguyen, Chan T

Application Number

US15/626,931
Publication Number

US 20170287523A1
Time in Patent Office

743 Days
Field of Search

3482314
US Class Current
CPC Class Codes

G06T 7/246   using feature-based methods...

G06V 20/41   Higher-level, semantic clus...

G06V 20/44   Event detection

G06V 20/47   Detecting features for summ...

G10L 15/063   Training

G10L 15/22   Procedures used during a sp...

G10L 2015/0631   Creating reference template...

G10L 2015/223   Execution procedure of a sp...

G10L 25/54   for retrieval

G10L 25/57   for processing of video sig...

G11B 27/031   Electronic editing of digit...

G11B 27/10   Indexing; Addressing; Timin...

G11B 27/13   the information being deriv...

G11B 27/28   by using information signal...

G11B 27/3081   used signal is a video-fram...

G11B 27/34   Indicating arrangements in...

H04N 19/513   Processing of motion vectors

H04N 5/77   between a recording apparat...

H04N 5/772   the recording apparatus and...

H04N 5/91   Television signal processin...

H04N 9/8205 : involving the multiplexing ...

View All

Voice-based video tagging

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Voice-based video tagging

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links