Computerized Prominent Character Recognition in Videos

US 20160034748A1
Filed: 07/29/2014
Published: 02/04/2016
Est. Priority Date: 07/29/2014
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

extracting, by one or more computing devices, feature points from video frames of a video file;

detecting, by at least one of the one or more computing devices, at least one face in at least a first video frame of the of the video frames;

inferring, by at least one of the one or more computing devices, the at least one face in a second video frame of the video frames, the inferring based at least in part on the feature points;

arranging, by at least one of the one or more computing devices, the video frames into groups; and

combining, by at least one of the one or more computing devices, two or more groups to create refined groups, the combining based at least in part on the two or more groups each including one or more video frames having at least one overlapping feature point associated with a detected face or an inferred face.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques for identifying prominent subjects in video content based on feature point extraction are described herein. Video files may be processed to detect faces on video frames and extract feature points from the video frames. Some video frames may include detected faces and extracted feature points and other video frames may not include detected faces. Based on the extracted feature points, faces may be inferred on video frames where no face was detected. The inferring may be based on feature points. Additionally, video frames may be arranged into groups and two or more groups may be merged. The merging may be based on some groups including video frames having overlapping feature points. The resulting groups each may identify a subject. A frequency representing a number of video frames where the subject appears may be determined for calculating a prominence score for each of the identified subjects in the video file.

64 Citations

View as Search Results

20 Claims

1. A method comprising:
- extracting, by one or more computing devices, feature points from video frames of a video file;
  
  detecting, by at least one of the one or more computing devices, at least one face in at least a first video frame of the of the video frames;
  
  inferring, by at least one of the one or more computing devices, the at least one face in a second video frame of the video frames, the inferring based at least in part on the feature points;
  
  arranging, by at least one of the one or more computing devices, the video frames into groups; and
  
  combining, by at least one of the one or more computing devices, two or more groups to create refined groups, the combining based at least in part on the two or more groups each including one or more video frames having at least one overlapping feature point associated with a detected face or an inferred face.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, wherein the inferring comprises:
    - determining a first feature point associated with the at least one face in the first video frame matches a second feature point in the second video frame wherein no faces are detected in the second video frame; and
      
      inferring the at least one face on the second video frame based at least in part on the first feature points matching the second feature point.
  - 3. The method of claim 1, wherein the arranging the video frames into the groups is based at least in part on similarity data associated with the detected faces or inferred faces on the video frames.
  - 4. The method of claim 1 further comprising, before combining the two or more groups, comparing feature points the video frames in the two or more groups.
  - 5. The method of claim 1, wherein each refined group in the refined groups is associated with a subject.
  - 6. The method of claim 1 further comprising, determining a frequency associated with the subject, the determining comprising counting a number of video frames including the subject and dividing the number of video frames including the subject by a total number of video frames in a video file.
  - 7. The method of claim 6, wherein the at least one face is associated with a set of face detail values, the face detail values including at least a size value and a position value associated with the at least one face.
  - 8. The method of claim 7 further comprising, calculating a prominence score associated with the subject based at least in part on at least one of the size value, the position value, or the frequency associated with the subject.

9. A system comprising:
- memory;
  
  one or more processors operably coupled to the memory; and
  
  one or more modules stored in the memory and executable by the one or more processors, the one or more modules including;
  
  a face detection module configured to detect one or more faces associated with one or more subjects in video frames in video files;
  
  a feature detection module configured to extract feature points from the video frames and infer the one or more faces on the video frames;
  
  a grouping module configured to arrange individual video frames into groups based at least in part on face landmarks associated with the one or more faces, wherein individual groups represent an individual subject of the one or more subjects; and
  
  a scoring module configured to determining a prominence score associated with each individual subject.
- View Dependent Claims (10, 11, 12, 13, 14)
- - 10. The system of claim 9, further comprising a post processing module configured to perform post processing operations including at least one of filtering the video files based at least in part on the prominence scores or ranking individual video files based at least in part on the prominence scores.
  - 11. The system of claim 9, wherein the feature detection module is further configured to:
    - track the feature points over the video frames;
      
      determine at least one feature point extracted from a first video frame of the video frames is associated with a detected face of the one or more faces;
      
      identify a second video frame of the video frames, wherein no faces are detected on the second video frame and at least one feature point is extracted from the second video frame;
      
      determine that the at least one feature point extracted from the first video frame and the at least one feature point extracted from the second video frame overlap; and
      
      infer the detected face on the second video frame based on the overlap of the at least one feature point extracted from the first video frame and the at least one feature point extracted from the second video frame.
  - 12. The system of claim 11, wherein the first video frame precedes the second video frame by one or more video frames.
  - 13. The system of claim 11, wherein the first video frame succeeds the second video frame by one or more video frames.
  - 14. The system of claim 9, wherein the grouping module is further configured to:
    - compare feature points on each of the individual video frames in the individual groups; and
      
      combine two or more individual groups to create a new group based at least in part on the two or more individual groups including individual video frames having at least one overlapping feature point associated with an identified face.

15. One or more computer-readable storage media encoded with instructions that, when executed by a processor, configure a computer to perform acts comprising:
- processing individual video files of a plurality of video files, the processing comprising;
  
  detecting faces in some video frames of the individual video files; and
  
  extracting feature points from the video frames;
  
  inferring faces in individual video frames of the video frames, wherein no face was detected in the individual video frames, the inferring based at least in part on the feature points;
  
  arranging the individual video frames into a plurality of groups;
  
  combining two or more individual groups of the plurality of groups to create a set of refined groups, the combining based at least in part on the two or more individual groups including video frames having at least one overlapping feature point;
  
  identifying subjects associated with each of the refined groups; and
  
  determining a frequency associated with the subject, the frequency representing a number of video frames in which an individual subject of the subjects appears in a particular video file of the video files.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The one or more computer-readable storage media of claim 15, wherein the acts further comprise calculating a prominence score associated with the individual subject based at least in part on the frequency, a size value, and a position value.
  - 17. The one or more computer-readable storage media of claim 16, wherein the acts further comprise receiving user input relating to user interaction with the plurality of video files.
  - 18. The one or more computer-readable storage media of claim 17, wherein the user interaction comprises filtering the plurality of video files to identify individual video files including a user specified subject, the filtering based at least in part on identifying the user specified subject in at least one of the combined groups.
  - 19. The one or more computer-readable storage media of claim 17, wherein the user interaction comprises ranking the individual video files, the ranking based at least in part on the prominence score.
  - 20. The one or more computer-readable storage media of claim 17, wherein the user interaction comprises identifying prominent video segments of the individual video files based at least in part on the prominence score.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Wang, Tzong-Jhy, Suri, Nitin, Ivory, Andrew S., Sproule, William D.

Granted Patent

US 9,934,423 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06V 10/462   Salient features, e.g. scal...

G06V 20/41   Higher-level, semantic clus...

G06V 40/16   Human faces, e.g. facial pa...

G06V 40/171   Local features and componen...

G06V 40/172   Classification, e.g. identi...

Computerized Prominent Character Recognition in Videos

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

64 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Computerized Prominent Character Recognition in Videos

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

64 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links