Intelligent TV system and method

US 8,811,673 B1
Filed: 04/18/2013
Issued: 08/19/2014
Est. Priority Date: 04/18/2013
Status: Active Grant

First Claim

Patent Images

1. A method for an intelligent user-interaction system based on object detection, comprising:

receiving an input video sequence corresponding to a video program;

dividing the input video sequence into a plurality of video shots, each containing one or more video frames;

detecting possible object occurrences in each of the plurality of video shots;

analyzing possible paths of an object in a video shot using a multimodal-cue approach;

aggregating the path-based selected object occurrences across the plurality of video shots to detect objects; and

generating a complete list of the object occurrences across the plurality of video shots;

wherein analyzing the possible paths using a multimodal-cue approach further includes;

combining an appearance cue, a spatio-temporal cue, and a topological cue to aid object detection in the plurality of video shots;

dictating a usage of an object'"'"'s visual features to detect possible object locations in a video frame using the appearance cue;

injecting information across a sequence of frames via relational constraints between a target object class and a related object class using the spatio-temporal cue and the topological cue;

fusing the multimodal cue information to create links between object occurrences across the video frames in a current video shot; and

applying dynamic programming to find optimal object paths.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method is provided for an intelligent user-interaction system based on object detection. The method includes receiving an input video sequence corresponding to a video program, and dividing the input video sequence into a plurality of video shots, each containing one or more video frames. The method also includes detecting possible object occurrences in each of the plurality of video shots, and analyzing possible paths of an object in a video shot using a multimodal-cue approach. Further, the method includes aggregating the path-based selected object occurrences across the plurality of video shots to detect objects, and generating a complete list of the object occurrences across the plurality of video shots.

Citations

15 Claims

1. A method for an intelligent user-interaction system based on object detection, comprising:
- receiving an input video sequence corresponding to a video program;
  
  dividing the input video sequence into a plurality of video shots, each containing one or more video frames;
  
  detecting possible object occurrences in each of the plurality of video shots;
  
  analyzing possible paths of an object in a video shot using a multimodal-cue approach;
  
  aggregating the path-based selected object occurrences across the plurality of video shots to detect objects; and
  
  generating a complete list of the object occurrences across the plurality of video shots;
  
  wherein analyzing the possible paths using a multimodal-cue approach further includes;
  
  combining an appearance cue, a spatio-temporal cue, and a topological cue to aid object detection in the plurality of video shots;
  
  dictating a usage of an object'"'"'s visual features to detect possible object locations in a video frame using the appearance cue;
  
  injecting information across a sequence of frames via relational constraints between a target object class and a related object class using the spatio-temporal cue and the topological cue;
  
  fusing the multimodal cue information to create links between object occurrences across the video frames in a current video shot; and
  
  applying dynamic programming to find optimal object paths.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method according to claim 1:
    - provided that O^land O^mare an object occurrence in the l^thvideo frame F_land an object occurrence in the m^thvideo frame F_mof a video sequence, a detected object occurrence O^lin a path has a conditional probability P(O^l|C),where C is a class of target objects; and
      
      consecutive object occurrences O^land O^min the path have an appearance correlation, which is defined by;
  - 3. The method according to claim 1:
    - provided that O^land O^mare a target-class object occurrence in the l^thvideo frame F_land an object occurrence in the m^thvideo frame F_mof a video sequence, a within-path deviation in the trajectories of a target-class object and a detected related-class object is defined by;
  - 4. The method according to claim 3, wherein:
    - Γ
      
      (.) is extended to include the relationship between the sizes of the bounding boxes of the target-class and related-class objects.
  - 5. The method according to claim 1:
    - provided that O^land R^lare a target-class object occurrence and a related-class object occurrence in the l^thvideo frame F_lof a video sequence, a function Ψ
      
      (.) that depends on a topological relationship between a specific related-class object R^land a detected object O^lis defined by;
  - 6. The method according to claim 1, further including:
    - generating a plurality of summary video frames for the video program to be shown on a display;
      
      detecting a hold command from a user to stop the video program; and
      
      presenting the plurality of summary video frames to the user on the display after stopping the video program.
  - 7. The method according to claim 6, further including:
    - obtaining a user selection on a selected summary frame from the plurality of the summary video frames;
      
      presenting a plurality of objects of interest detected based on the object occurrences to the user on the display;
      
      determining a user-selected object of interest from the plurality of objects of interest;
      
      searching the selected object; and
      
      presenting the user with contents based on the searching results.

8. An intelligent user-interaction system, comprising:
- a video decoder configured to decode an incoming bit stream;
  
  a data storage configured to store a certain time of incoming bit-stream as an input video sequence corresponding to a video program to be shown to a user on a display;
  
  a preprocessing unit configured to divide the input video sequence into a plurality of video shots, each containing one or more video frames;
  
  a detection unit configured to detect possible object occurrences in each video shot;
  
  a path analysis unit configured to analyze possible paths of an object in a video shot using a multimodal-cue approach; and
  
  an aggregation unit configured to aggregate the path-based selected object occurrences across the plurality of video shots to detect objects;
  
  wherein the path analysis unit is further configured to;
  
  combine an appearance cue, a spatio-temporal cue, and a topological cue to aid object detection in the plurality of video shots;
  
  dictate a usage of an object'"'"'s visual features to detect possible object locations in a video frame based on the appearance cue;
  
  inject information across a sequence of frames via relational constraints between a target object class and a related object class using the spatio-temporal cue and the topological cue;
  
  fuse the multimodal cue information to create links between object occurrences across the video frames in a current video shot; and
  
  apply dynamic programming to find optimal object paths.
- View Dependent Claims (9, 10, 11, 12, 13, 14, 15)
- - 9. The system according to claim 8:
    - provided that O^land O^mare an object occurrence in the l^thvideo frame F_land an object occurrence in the m^thvideo frame F_mof a video sequence, a detected object occurrence O^lin a path has a conditional probability P(O^l|C),where C is a class of target objects; and
      
      consecutive object occurrences O^land O^min the path have an appearance correlation, which is defined by;
  - 10. The system according to claim 8:
    - provided that O^land O^mare a target-class object occurrence in the l^thvideo frame F_land an object occurrence in the m^thvideo frame F_mof a video sequence, a within-path deviation in the trajectories of a target-class object and a detected related-class object is defined by;
  - 11. The system according to claim 10, wherein:
    - Γ
      
      (.) is extended to include the relationship between sizes of the bounding boxes of the target-class and related-class objects.
  - 12. The system according to claim 8:
    - provided that O^land R^lare a target-class object occurrence and a related-class object occurrence in the i^thvideo frame F_lof a video sequence, a function Ψ
      
      (.) that depends on the topological relationship between a specific related-class object R^land a detected object O^lis defined by;
  - 13. The system according to claim 8, wherein further including:
    - the preprocessing unit may summarize the past a few or any number of minutes of video data stored in the data storage module into a number of video shots for the user to select when a user tries to rewind the TV program.
  - 14. The system according to claim 8, further configured to:
    - generate a plurality of summary video frames for the video program to be shown on a display;
      
      detect a hold command from a user to stop the video program; and
      
      present the plurality of summary video frames to the user on the display after stopping the video program.
  - 15. The system according to claim 14, further configured to:
    - obtain a user selection on a selected summary frame from the plurality of the summary video frames;
      
      present a plurality of objects of interest detected based on the object occurrences to the user on the display;
      
      determine a user-selected object of interest from the plurality of objects of interest;
      
      search the selected object; and
      
      present the user with contents based on the searching results.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
TCL Research America, Inc. (TCL Technology Group Corp.)
Original Assignee
TCL Research America, Inc. (TCL Technology Group Corp.)
Inventors
Wang, Haohong, Fleites, Fausto C.
Primary Examiner(s)
MONTOYA, OSCHTA I

Application Number

US13/865,329
Time in Patent Office

488 Days
Field of Search

None
US Class Current

382/103
CPC Class Codes

G06F 16/786   using motion, e.g. object m...

G06V 20/52   Surveillance or monitoring ...

H04N 21/44008   involving operations for an...

Intelligent TV system and method

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

15 Claims

Specification

Solutions

Use Cases

Quick Links

Intelligent TV system and method

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

15 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links