Real-time single-view action recognition based on key pose analysis for sports videos

US 9,600,717 B1
Filed: 02/25/2016
Issued: 03/21/2017
Est. Priority Date: 02/25/2016
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for action recognition in a sports video, the method comprising:

receiving a plurality of training videos, each of the training videos associated with a sports type, and each of the training videos including a plurality of video frames;

training, for each sports type of a plurality of different sports types, one or more feature models using the plurality of the training videos, the training comprising;

training a player detector for detecting location of a player in each video frame of a training video;

training a set of key pose identifiers for a sports action distinctively associated with each sports type of the sports videos, the sports action associated with each sports type of the sports videos being represented by a set of distinctive poses; and

training a meta classifier for determining a likelihood that the sports action has happened in a training video based on identification result by the trained set of key pose identifiers;

selecting one or more trained feature models that are associated with a sports type of an input video; and

applying the selected trained feature models to a plurality of video frames of the input video to recognize a sports action captured by the input video.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system is provided for real-time single-view action recognition for sports videos based on key pose analysis of the sports videos. A training module of the system trains feature models for a sports action distinctively associated with each sports type using a large corpus of training videos. The trained feature models include a player detector for detecting locations of a player in video frames of a training video, a set of key pose identifiers for identifying distinctive poses of a sports action associated with a type of sports, and a meta classifier for determining a likelihood that the sports action has happened in a sports video based on the key poses analysis. Responsive to an input sports video being received for real-time action recognition, a set of trained feature models associated with the sports type of the input video are selected and applied to the input video.

80 Citations

View as Search Results

20 Claims

1. A computer-implemented method for action recognition in a sports video, the method comprising:
- receiving a plurality of training videos, each of the training videos associated with a sports type, and each of the training videos including a plurality of video frames;
  
  training, for each sports type of a plurality of different sports types, one or more feature models using the plurality of the training videos, the training comprising;
  
  training a player detector for detecting location of a player in each video frame of a training video;
  
  training a set of key pose identifiers for a sports action distinctively associated with each sports type of the sports videos, the sports action associated with each sports type of the sports videos being represented by a set of distinctive poses; and
  
  training a meta classifier for determining a likelihood that the sports action has happened in a training video based on identification result by the trained set of key pose identifiers;
  
  selecting one or more trained feature models that are associated with a sports type of an input video; and
  
  applying the selected trained feature models to a plurality of video frames of the input video to recognize a sports action captured by the input video.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, wherein training a player detector comprises:
    - determining, for each of the video frames of a training video, a player region within the video frame, within which a player is detected; and
      
      selecting a series of video frames within which the player regions are determined.
  - 3. The method of claim 2, wherein the player region is a rectangular bounding box within the video frame, within which the player is detected.
  - 4. The method of claim 1, wherein training a set of key pose identifiers for a sports action comprises:
    - for each distinctive pose of the sports action;
      
      identifying a set of positive training samples and a set of negative training samples from the plurality of training videos, a positive training sample comprising a video frame showing a player performing the distinctive pose of the sports action, and a negative training sample comprising a video frame showing a player not performing the distinctive pose of the sports action;
      
      extracting visual features from video frames corresponding to the set of positive training samples and to the set of negative training samples;
      
      comparing the extracted visual features with visual features associated with the distinctive pose of the sports action;
      
      generating a score for each video frame based on the comparison, the score indicating how well a pose detected in the each video frame matches the distinctive pose of the sports action.
  - 5. The method of claim 1, wherein training a meta classifier comprises:
    - for each distinctive pose of the sports action;
      
      applying a time window of a pre-defined width to a score sequence associated with the corresponding distinctive pose, the score sequence associated with the corresponding distinctive pose comprising a plurality of scores generated for a set of video frames of a training video selected by the set of key pose identifiers; and
      
      determining whether the distinctive pose of the sports action has happened in the training video based on the application of the time window.
  - 6. The method of claim 5, further comprising:
    - determining whether the sports action has happened in the training video based on the determination for each distinctive pose of the sports action.
  - 7. The method of claim 1, wherein a sports type of the plurality of different sports types is baseball, and wherein a sports action associated with a baseball sports video is a baseball swing.
  - 8. The method of claim 7, wherein the baseball swing is represented by a set of three distinctive poses, comprising:
    - a begin pose, the begin pose representing a player lifting a baseball bat before striking a baseball;
      
      an impact pose, the impact pose representing the player striking the baseball with the baseball bat; and
      
      an end pose, the end pose representing the player finishing striking the baseball with a body rotation.
  - 9. The method of claim 1, wherein applying the selected trained feature models to a plurality of video frames of the input video comprises:
    - applying a trained player detector to the plurality of video frames of the input video;
      
      applying a set of trained key pose identifiers associated with the sports type of the input video to the plurality of video frames of the input video; and
      
      applying a trained meta classifier to a score sequence generated by the set of trained key pose identifiers.
  - 10. The method of claim 9, further comprising:
    - generating a report for presentation to a user based on the application of the trained feature models, the report describing recognition result of the sports action in the input video.

11. A non-transitory computer readable storage medium storing computer program instructions, the computer program instructions when executed by a computer processor causes the processor to perform the steps of:
- receiving a plurality of training videos, each of the training videos associated with a sports type, and each of the training videos including a plurality of video frames;
  
  training, for each sports type of a plurality of different sports types, one or more feature models using the plurality of the training videos, the training comprising;
  
  training a player detector for detecting location of a player in each video frame of a training video;
  
  training a set of key pose identifiers for a sports action distinctively associated with each sports type of the sports videos, the sports action associated with each sports type of the sports videos being represented by a set of distinctive poses; and
  
  training a meta classifier for determining a likelihood that the sports action has happened in a training video based on identification result by the trained set of key pose identifiers;
  
  selecting one or more trained feature models that are associated with a sports type of an input video; and
  
  applying the selected trained feature models to a plurality of video frames of the input video to recognize a sports action captured by the input video.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 12. The non-transitory computer readable storage medium claim 11, wherein training a player detector comprises:
    - determining, for each of the video frames of a training video, a player region within the video frame, within which a player is detected; and
      
      selecting a series of video frames within which the player regions are determined.
  - 13. The non-transitory computer readable storage medium of claim 12, wherein the player region is a rectangular bounding box within the video frame, within which the player is detected.
  - 14. The non-transitory computer readable storage medium of claim 11, wherein training a set of key pose identifiers for a sports action comprises:
    - for each distinctive pose of the sports action;
      
      identifying a set of positive training samples and a set of negative training samples from the plurality of training videos, a positive training sample comprising a video frame showing a player performing the distinctive pose of the sports action, and a negative training sample comprising a video frame showing a player not performing the distinctive pose of the sports action;
      
      extracting visual features from video frames corresponding to the set of positive training samples and to the set of negative training samples;
      
      comparing the extracted visual features with visual features associated with the distinctive pose of the sports action;
      
      generating a score for each video frame based on the comparison, the score indicating how well a pose detected in the each video frame matches the distinctive pose of the sports action.
  - 15. The non-transitory computer readable storage medium of claim 11, wherein training a meta classifier comprises:
    - for each distinctive pose of the sports action;
      
      applying a time window of a pre-defined width to a score sequence associated with the corresponding distinctive pose, the score sequence associated with the corresponding distinctive pose comprising a plurality of scores generated for a set of video frames of a training video selected by the set of key pose identifiers; and
      
      determining whether the distinctive pose of the sports action has happened in the training video based on the application of the time window.
  - 16. The non-transitory computer readable storage medium of claim 15, further comprising:
    - determining whether the sports action has happened in the training video based on the determination for each distinctive pose of the sports action.
  - 17. The non-transitory computer readable storage medium of claim 11, wherein a sports type of the plurality of different sports types is baseball, and wherein a sports action associated with a baseball sports video is a baseball swing.
  - 18. The non-transitory computer readable storage medium of claim 17, wherein the baseball swing is represented by a set of three distinctive poses, comprising:
    - a begin pose, the begin pose representing a player lifting a baseball bat before striking a baseball;
      
      an impact pose, the impact pose representing the player striking the baseball with the baseball bat; and
      
      an end pose, the end pose representing the player finishing striking the baseball with a body rotation.
  - 19. The non-transitory computer readable storage medium of claim 11, wherein applying the selected trained feature models to a plurality of video frames of the input video comprises:
    - applying a trained player detector to the plurality of video frames of the input video;
      
      applying a set of trained key pose identifiers associated with the sports type of the input video to the plurality of video frames of the input video; and
      
      applying a trained meta classifier to a score sequence generated by the set of trained key pose identifiers.
  - 20. The non-transitory computer readable storage medium of claim 19, further comprising:
    - generating a report for presentation to a user based on the application of the trained feature models, the report describing recognition result of the sports action in the input video.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Beijing Shunyuan Kaihua Technology Limited
Original Assignee
Zepp Labs, Inc. (Zepp Health Corporation)
Inventors
Liu, Zeyu, Dai, Xiaowei, Liu, Jiangyu, Han, Zheng
Primary Examiner(s)
Seth, Manav

Application Number

US15/053,773
Time in Patent Office

390 Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 18/21   Design or setup of recognit...

G06N 20/00   Machine learning

G06N 20/10   using kernel methods, e.g. ...

G06N 3/08   Learning methods

G06V 20/40   in video content extracting...

G06V 20/42   of sport video content

G06V 20/47   Detecting features for summ...

G06V 40/103   Static body considered as a...

G06V 40/20   Movements or behaviour, e.g...

G06V 40/23   Recognition of whole body m...

Real-time single-view action recognition based on key pose analysis for sports videos

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

80 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

Real-time single-view action recognition based on key pose analysis for sports videos

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

80 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others