Data recognition in content

US 8,849,041 B2
Filed: 06/04/2012
Issued: 09/30/2014
Est. Priority Date: 06/04/2012
Status: Active Grant

First Claim

Patent Images

1. A method, comprising:

identifying a set of entities in video content;

for a scene in the video content, identifying a first confidence value vector that is representative of features of the scene and that is a result of a video recognition process;

for the scene, identifying a second confidence value vector that is representative of features of the scene and that is a result of an audio recognition process; and

based on the first confidence value vector and the second confidence value vector, determining, by a computing device, at least one identifier that defines whether an entity in the set of entities is present in the scene.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The disclosure relates to recognizing data such as items or entities in content. In some aspects, content may be received and feature information, such as face recognition data and voice recognition data may be generated. Scene segmentation may also be performed on the content, grouping the various shots of the video content into one or more shot collections, such as scenes. For example, a decision lattice representative of possible scene segmentations may be determined and the most probable path through the decision lattice may be selected as the scene segmentation. Upon generating the feature information and performing the scene segmentation, one or more items or entities that are present in the scene may be identified.

34 Citations

View as Search Results

20 Claims

1. A method, comprising:
- identifying a set of entities in video content;
  
  for a scene in the video content, identifying a first confidence value vector that is representative of features of the scene and that is a result of a video recognition process;
  
  for the scene, identifying a second confidence value vector that is representative of features of the scene and that is a result of an audio recognition process; and
  
  based on the first confidence value vector and the second confidence value vector, determining, by a computing device, at least one identifier that defines whether an entity in the set of entities is present in the scene.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, further comprising:
    - identifying a plurality of shots in the video content;
      
      creating a lattice of nodes that comprises at least one of a scene boundary node or a non-scene boundary node for each shot in the plurality of shots, wherein the lattice of nodes defines a plurality of paths beginning at a first shot of the plurality of shots and ending at a last shot of the plurality of shots;
      
      ranking the plurality of paths; and
      
      selecting, based on the ranking, which one of the plurality of paths is to define where boundaries of the scene are located in the video content.
  - 3. The method of claim 2, wherein creating the lattice of nodes comprises:
    - calculating a probability that a current shot is a scene boundary;
      
      calculating a probability that the current shot is a non-scene boundary; and
      
      inserting the at least one of the scene boundary node or the non-scene boundary node for the current shot into the lattice based on the probability that the current shot is a scene boundary and the probability that the current shot is a non-scene boundary.
  - 4. The method of claim 1, wherein the video recognition process comprises a face recognition process, wherein the audio recognition process comprises a voice recognition process, wherein at least one confidence value of the first confidence value vector defines a probability that a face of an entity in the set of entities is present, and wherein at least one confidence value of the second confidence value vector defines a probability that a category of phone is being uttered by an entity in the set of entities.
  - 5. The method of claim 1, further comprising:
    - calculating acoustic features from audio of the video content;
      
      detecting an occurrence of a phone based on the acoustic features;
      
      determining a plurality of confidence values for each entity in the set of entities, wherein at least one value in the plurality of confidence values defines a probability that the phone belongs to one of a plurality of phone categories; and
      
      determining the second confidence value vector from the plurality of confidence values for each entity in the set of entities, wherein a first value of the second confidence value vector is selected from a first plurality of confidence values for a first entity in the set of entities.
  - 6. The method of claim 5, wherein determining the plurality of confidence values for each entity in the set of entities comprises calculating the first plurality of confidence values for the first entity using a set of mixture models, wherein each model in the set of mixture models calculates a probability that an input phone belongs to a phone category pronounced by the first entity, and wherein each model in the set of mixture models corresponds to one of the plurality of phone categories.
  - 7. The method of claim 1, further comprising:
    - for the scene, determining a plurality of salience measurements, wherein each of the plurality of salience measurements corresponds to a different entity in the set of entities, and a first measurement in the plurality of salience measurements numerically indicates importance of a first entity to the scene;
      
      determining that the first measurement satisfies a salience threshold; and
      
      inserting an identifier of the first entity into a listing of entities that are present and salient to the scene.

8. An apparatus, comprising:
- one or more processors;
  
  memory storing executable instructions configured to, with the one or more processors, cause the apparatus to;
  
  identify a set of entities in video content;
  
  for a scene in the video content, identify a first confidence value vector that is representative of features of the scene and that is a result of a video recognition process;
  
  for the scene, identify a second confidence value vector that is representative of features of the scene and that is a result of an audio recognition process; and
  
  based on the first confidence value vector and the second confidence value vector, determine at least one identifier that defines whether an entity in the set of entities is present in the scene.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The apparatus of claim 8, wherein the executable instructions are configured to, with the one or more processors, cause the apparatus to:
    - identify a plurality of shots in the video content;
      
      create a lattice of nodes that comprises at least one of a scene boundary node or a non-scene boundary node for each shot in the plurality of shots, wherein the lattice of nodes defines a plurality of paths beginning at a first shot of the plurality of shots and ending at a last shot of the plurality of shots;
      
      rank the plurality of paths; and
      
      select, based on the rank, which one of the plurality of paths is to define where boundaries of the scene are located in the video content.
  - 10. The apparatus of claim 9, wherein creating the lattice of nodes comprises:
    - calculating a probability that a current shot is a scene boundary;
      
      calculating a probability that the current shot is a non-scene boundary; and
      
      inserting the at least one of the scene boundary node or the non-scene boundary node into the lattice based on the probability that the current shot is a scene boundary and the probability that the current shot is a non-scene boundary.
  - 11. The apparatus of claim 8, wherein the video recognition process comprises a face recognition process, wherein the audio recognition process comprises a voice recognition process, wherein at least one confidence value of the first confidence value vector defines a probability that a face of an entity in the set of entities is present, and wherein at least one confidence value of the second confidence value vector defines a probability that a category of phone is being uttered by an entity in the set of entities.
  - 12. The apparatus of claim 8, wherein the executable instructions are configured to, with the one or more processors, cause the apparatus to:
    - calculate acoustic features from audio of the video content;
      
      detect an occurrence of a phone based on the acoustic features;
      
      determine a plurality of confidence values for each entity in the set of entities, wherein at least one value in the plurality of confidence values defines a probability that the phone belongs to one of a plurality of phone categories; and
      
      determine the second confidence value vector from the plurality of confidence values for each entity in the set of entities, wherein a first value of the second confidence value vector is selected from a first plurality of confidence values for a first entity in the set of entities.
  - 13. The apparatus of claim 12, wherein determining the plurality of confidence values for each entity in the set of entities comprises calculating the first plurality of confidence values for the first entity using a set of mixture models, wherein each model in the set of mixture models calculates a probability that an input phone belongs to a phone category pronounced by the first entity, and wherein each model in the set of mixture models corresponds to one of the plurality of phone categories.
  - 14. The apparatus of claim 8, wherein the executable instructions are configured to, with the one or more processors, cause the apparatus to:
    - for the scene, determine a plurality of salience measurements, wherein each of the plurality of salience measurements corresponds to a different entity in the set of entities, and a first measurement in the plurality of salience measurements numerically indicates importance of a first entity to the scene;
      
      determine that the first measurement satisfies a salience threshold; and
      
      insert an identifier of the first entity into a listing of entities that are present and salient to the scene.

15. A method comprising:
- performing feature recognition on video content using a at least a video recognition technique and an audio recognition technique, which results in feature information for the video content;
  
  determining, based on a selection of a path from a plurality of possible paths through a node lattice that comprises at least one of a scene boundary node or a non-scene boundary node for each shot in the video content, defining boundaries of a scene in the video content;
  
  identify, from the feature information, a set of confidence value vectors for the scene that comprises a first confidence value vector for the video recognition technique and a second confidence value vector for the audio recognition technique; and
  
  identify one or more items present in the scene based on the set of confidence value vectors.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The method of claim 15, further comprising:
    - creating the node lattice;
      
      ranking the plurality of possible paths through the node lattice; and
      
      selecting the path based on the ranking.
  - 17. The method of claim 15, wherein the video recognition technique comprises a face recognition technique, wherein the audio recognition technique comprises a voice recognition technique, wherein at least one confidence value of the first confidence value vector defines a probability that a face is present, and wherein at least one confidence value of the second confidence value vector defines a probability that a category of phone is being uttered.
  - 18. The method of claim 17, further comprising:
    - performing the voice recognition technique by at leastcalculating acoustic features from audio of the video content,detecting an occurrence of a phone based on the acoustic features,determining a plurality of confidence values, wherein at least one value in the plurality of confidence values defines a probability that the phone belongs to one of a plurality of phone categories, anddetermining the second confidence value vector from the plurality of confidence values.
  - 19. The method of claim 18, wherein determining the plurality of confidence values comprises calculating the plurality of confidence values using a set of mixture models, wherein each model in the set of mixture models calculates a probability that an input phone belongs to a phone category, and wherein each model in the set of mixture models corresponds to one of the plurality of phone categories.
  - 20. The method of claim 15, further comprising:
    - determining a plurality of salience measurements, wherein a first measurement in the plurality of salience measurements numerically indicates importance of a particular item of the one or more items to the scene;
      
      determining that the first measurement satisfies a salience threshold; and
      
      storing an identifier of the particular item into a listing of item identifiers that are present and salient to the scene.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Comcast Cable Communications LLC (Comcast Corporation)
Original Assignee
Comcast Cable Communications LLC (Comcast Corporation)
Inventors
Neumann, Jan, Tzoukermann, Evelyne, Bagga, Amit, Jojic, Oliver, Shevade, Bageshree, Houghton, David, Farrell, Corey
Primary Examiner(s)
Le, Vu
Assistant Examiner(s)
WOLDEMARIAM, AKLILU K

Application Number

US13/487,543
Publication Number

US 20130322765A1
Time in Patent Office

848 Days
Field of Search

382/103, 382/176, 382/192, 382/276, 382/303
US Class Current

382/197
CPC Class Codes

G06F 16/70   of video data

G06F 18/00   Pattern recognition

G06F 40/263   Language identification

G06V 20/41   Higher-level, semantic clus...

G06V 20/46   Extracting features or char...

G06V 20/49   Segmenting video sequences,...

G06V 40/16   Human faces, e.g. facial pa...

G06V 40/161   Detection; Localisation; No...

G10L 15/22   Procedures used during a sp...

G10L 25/57   for processing of video sig...

Data recognition in content

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

34 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Data recognition in content

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

34 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links