Method and apparatus for extracting indexing information from digital video data
First Claim
1. A computer-implemented speech and video analysis system for creating an index to indicate locations of a first event occurring within audio-video data, said audio-video data containing audio data synchronized with video data to represent a plurality of events, said first event having at least one audio-feature and at least one video-feature indicative of said first event, comprising the steps of:
- (a) providing a model speech database for storing speech models representative of said audio-feature;
(b) providing a model video database for storing video models representative of said video-feature;
(c) performing wordspotting to determine candidates by comparing said audio data with said stored speech models, said candidates indicating positions of said audio-feature within said audio data;
(d) establishing predetermined ranges around each of said candidates;
(e) segmenting into shots those portions of said video data which are located within said ranges;
(f) analyzing said segmented video data to determine video-locations based on a comparison between said segmented video data and said stored video models, said video-locations indicating positions of said video-feature within said segmented video data; and
(g) generating an index to indicate locations of said first event based on said video-locations.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and apparatus to automatically index the locations of specified events on a video tape. The events, for example, include touchdowns, fumbles and other football-related events. An index to the locations where these events occur are created by using both speech detection and video analysis algorithms. A speech detection algorithm locates specific words in the audio portion data of the video tape. Locations where the specific words are found are passed to the video analysis algorithm. A range around each of the locations is established. Each range is segmented into shots using a histogram technique. The video analysis algorithm analyzes each segmented range for certain video features using line extraction techniques to identify the event. The final product of the video analysis is a set of pointers (or indexes) to the locations of the events in the video tape.
-
Citations
50 Claims
-
1. A computer-implemented speech and video analysis system for creating an index to indicate locations of a first event occurring within audio-video data, said audio-video data containing audio data synchronized with video data to represent a plurality of events, said first event having at least one audio-feature and at least one video-feature indicative of said first event, comprising the steps of:
-
(a) providing a model speech database for storing speech models representative of said audio-feature; (b) providing a model video database for storing video models representative of said video-feature; (c) performing wordspotting to determine candidates by comparing said audio data with said stored speech models, said candidates indicating positions of said audio-feature within said audio data; (d) establishing predetermined ranges around each of said candidates; (e) segmenting into shots those portions of said video data which are located within said ranges; (f) analyzing said segmented video data to determine video-locations based on a comparison between said segmented video data and said stored video models, said video-locations indicating positions of said video-feature within said segmented video data; and (g) generating an index to indicate locations of said first event based on said video-locations. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
-
-
26. An apparatus for creating an index to indicate locations of a first event occurring within audio-video data, said audio-video data containing audio data synchronized with video data to represent a plurality of events, said first event having at least one audio-feature and at least one video-feature indicative of said first event, comprising:
-
a model speech database for storing speech models representative of said audio-feature; a model video database for storing video models representative of said video-feature; a wordspotter coupled to said model speech database for determining candidates based on comparison between said audio data with said stored speech models, said candidates indicating positions of said audio-feature within said audio data; range establishing means coupled to said wordspotter for establishing predetermined ranges around each of said candidates; a segmenting device coupled to said range establishing means for segmenting into shots those portions of said video data which are located within said ranges; an video analyzer coupled to said segmenting device and to said model video database for determining video-locations based on a comparison between said video data and said stored video models, said video-locations indicating positions of said video-feature within said video data; and an indexer coupled to said video analyzer for creating indicating the locations of said first event within said audio-video data based on said determined video-locations. - View Dependent Claims (27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50)
-
Specification