Voice-based video tagging

US 9,685,194 B2
Filed: 10/31/2014
Issued: 06/20/2017
Est. Priority Date: 07/23/2014
Status: Active Grant

First Claim

Patent Images

1. A method for identifying events of interest in a captured video, the method comprising:

storing multiple stored speech patterns for multiple input types, the multiple stored speech patterns corresponding to a command for identifying the events of interest within the captured video, wherein the multiple stored speech patterns include a first stored speech pattern for a first input type, wherein storing the first stored speech pattern comprises;

receiving, from a user, an input configuring a camera into a training mode to learn the first stored speech pattern;

capturing the first stored speech pattern from the user; and

storing the first stored speech pattern, wherein the first stored speech pattern is stored in response to capturing the first stored speech pattern from the user a threshold number of times;

accessing a captured speech pattern, the captured speech pattern captured from the user during capture of the captured video;

determining that the captured speech pattern corresponds to the first stored speech pattern; and

in response to determining that the captured speech pattern corresponds to the first stored speech pattern, storing event of interest information in metadata associated with the captured video, the event of interest information identifying (i) the first input type for a first event of interest, and (ii) an event moment during the capture of the captured video at which the captured speech pattern was captured from the user.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Video and corresponding metadata is accessed. Events of interest within the video are identified based on the corresponding metadata, and best scenes are identified based on the identified events of interest. A video summary can be generated including one or more of the identified best scenes. The video summary can be generated using a video summary template with slots corresponding to video clips selected from among sets of candidate video clips. Best scenes can also be identified by receiving an indication of an event of interest within video from a user during the capture of the video. Metadata patterns representing activities identified within video clips can be identified within other videos, which can subsequently be associated with the identified activities.

Citations

15 Claims

1. A method for identifying events of interest in a captured video, the method comprising:
- storing multiple stored speech patterns for multiple input types, the multiple stored speech patterns corresponding to a command for identifying the events of interest within the captured video, wherein the multiple stored speech patterns include a first stored speech pattern for a first input type, wherein storing the first stored speech pattern comprises;
  
  receiving, from a user, an input configuring a camera into a training mode to learn the first stored speech pattern;
  
  capturing the first stored speech pattern from the user; and
  
  storing the first stored speech pattern, wherein the first stored speech pattern is stored in response to capturing the first stored speech pattern from the user a threshold number of times;
  
  accessing a captured speech pattern, the captured speech pattern captured from the user during capture of the captured video;
  
  determining that the captured speech pattern corresponds to the first stored speech pattern; and
  
  in response to determining that the captured speech pattern corresponds to the first stored speech pattern, storing event of interest information in metadata associated with the captured video, the event of interest information identifying (i) the first input type for a first event of interest, and (ii) an event moment during the capture of the captured video at which the captured speech pattern was captured from the user.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The method of claim 1, further comprising:
    - identifying a portion of the captured video as a video clip associated with the first event of interest based on the event of interest information, the video clip comprising a first amount of the captured video occurring before the event moment and a second amount of the captured video occurring after the event moment, the first amount and the second amount being determined based on the first input type; and
      
      storing clip information indicating the association of the video clip with the first event of interest and the portion of the captured video included in the video clip.
  - 3. The method of claim 2, further comprising:
    - receiving a request to generate a video summary; and
      
      generating the video summary in response to the request, the video summary comprising the video clip associated with the first event of interest.
  - 4. The method of claim 1, wherein the first event of interest corresponds to an activity type, wherein the first stored speech pattern identifies the activity type, and wherein storing the event of interest information comprises storing an indication of the activity type in the metadata.
  - 5. The method of claim 1, wherein the first stored speech pattern is specific to the user.

6. A system for identifying events of interest in a captured video, the system comprising:
- a processor configured by instructions to;
  
  store multiple stored speech patterns for multiple input types, the multiple stored speech patterns corresponding to a command for identifying the events of interest within the captured video, wherein the multiple stored speech patterns include a first stored speech pattern for a first input type, wherein storing the first stored speech pattern comprises;
  
  receiving, from a user, an input configuring a camera into a training mode to learn the first stored speech pattern;
  
  capturing the first stored speech pattern from the user; and
  
  storing the first stored speech pattern, wherein the first stored speech pattern is stored in response to capturing the first stored speech pattern from the user a threshold number of times;
  
  access a captured speech pattern, the captured speech pattern captured from the user during capture of the captured video;
  
  determine that the captured speech pattern corresponds to the first stored speech pattern; and
  
  in response to determining that the captured speech pattern corresponds to the first stored speech pattern, store event of interest information in metadata associated with the captured video, the event of interest information identifying (i) the first input type for a first event of interest, and (ii) an event moment during the capture of the captured video at which the captured speech pattern was captured from the user.
- View Dependent Claims (7, 8, 9, 10)
- - 7. The system of claim 6, wherein the processor is further configured to:
    - identify a portion of the captured video as a video clip associated with the first event of interest based on the event of interest information, the video clip comprising a first amount of the captured video occurring before the event moment and a second amount of the captured video occurring after the event moment, the first amount and the second amount being determined based on the first input type; and
      
      store clip information indicating the association of the video clip with the first event of interest and the portion of the captured video included in the video clip.
  - 8. The system of claim 7, wherein the processor is further configured to:
    - receive a request to generate a video summary; and
      
      generate the video summary in response to the request, the video summary comprising the video clip associated with the first event of interest.
  - 9. The system of claim 6, wherein the first event of interest corresponds to an activity type, wherein the first stored speech pattern identifies the activity type, and wherein storing the event of interest information comprises storing an indication of the activity type in the metadata.
  - 10. The system of claim 6, wherein the first stored speech pattern is specific to the user.

11. A non-transitory computer-readable storage medium storing instructions for identifying events of interest in a captured video, the instructions, when executed, causing a processor to:
- store multiple stored speech patterns for multiple input types, the multiple stored speech patterns corresponding to a command for identifying the events of interest within the captured video, wherein the multiple stored speech patterns include a first stored speech pattern for a first input type, wherein storing the first stored speech pattern comprises;
  
  receiving, from a user, an input configuring a camera into a training mode to learn the first stored speech pattern;
  
  capturing the first stored speech pattern from the user; and
  
  storing the first stored speech pattern, wherein the first stored speech pattern is stored in response to capturing the first stored speech pattern from the user a threshold number of times;
  
  access a captured speech pattern, the captured speech pattern captured from the user during capture of the captured video;
  
  determine that the captured speech pattern corresponds to the first stored speech pattern; and
  
  in response to determining that the captured speech pattern corresponds to the first stored speech pattern, store event of interest information in metadata associated with the captured video, the event of interest information identifying (i) the first input type for a first event of interest, and (ii) an event moment during the capture of the captured video at which the captured speech pattern was captured from the user.
- View Dependent Claims (12, 13, 14, 15)
- - 12. The computer-readable storage medium of claim 11, wherein the instructions, when executed, further causes the processor to:
    - identify a portion of the captured video as a video clip associated with the first event of interest based on the event of interest information, the video clip comprising a first amount of the captured video occurring before the event moment and a second amount of the captured video occurring after the event moment, the first amount and the second amount being determined based on the first input type; and
      
      store clip information indicating the association of the video clip with the first event of interest and the portion of the captured video included in the video clip.
  - 13. The computer-readable storage medium of claim 12, wherein the instructions, when executed, further causes the processor to:
    - receive a request to generate a video summary; and
      
      generate the video summary in response to the request, the video summary comprising the video clip associated with the first event of interest.
  - 14. The computer-readable storage medium of claim 11, wherein the first event of interest corresponds to an activity type, wherein the first stored speech pattern identifies the activity type, and wherein storing the event of interest information comprises storing an indication of the activity type in the metadata.
  - 15. The computer-readable storage medium of claim 11, wherein the first stored speech pattern is specific to the user.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
GoPro, Inc.
Original Assignee
GoPro, Inc.
Inventors
Hodulik, Nick, Taylor, Jonathan, Leong, Brian, Briz, Bettina, Boghosian, Lisa, Knott, Mike
Primary Examiner(s)
Ye, Lin
Assistant Examiner(s)
NGUYEN, CHAN T H

Application Number

US14/530,245
Publication Number

US 20160055885A1
Time in Patent Office

963 Days
Field of Search

3482314, 396 56, 704275
US Class Current
CPC Class Codes

G06T 7/246   using feature-based methods...

G06V 20/41   Higher-level, semantic clus...

G06V 20/44   Event detection

G06V 20/47   Detecting features for summ...

G10L 15/063   Training

G10L 15/22   Procedures used during a sp...

G10L 2015/0631   Creating reference template...

G10L 2015/223   Execution procedure of a sp...

G10L 25/54   for retrieval

G10L 25/57   for processing of video sig...

G11B 27/031   Electronic editing of digit...

G11B 27/10   Indexing; Addressing; Timin...

G11B 27/13   the information being deriv...

G11B 27/28   by using information signal...

G11B 27/3081   used signal is a video-fram...

G11B 27/34   Indicating arrangements in...

H04N 19/513   Processing of motion vectors

H04N 5/77   between a recording apparat...

H04N 5/772   the recording apparatus and...

H04N 5/91   Television signal processin...

H04N 9/8205 : involving the multiplexing ...

View All

Voice-based video tagging

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

15 Claims

Specification

Solutions

Use Cases

Quick Links

Voice-based video tagging

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

15 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links