Multimedia computer system with story segmentation capability and operating program therefor including finite automation video parser

US 6,363,380 B1
Filed: 01/13/1998
Issued: 03/26/2002
Est. Priority Date: 01/13/1998
Status: Expired due to Term

First Claim

Patent Images

1. A multimedia signal parsing method for operating a multimedia computer system receiving a multimedia signal including a video shot sequence, an associated audio signal and corresponding text information to permit story segmentation of the multimedia signal into discrete stories, each of which has associated therewith a final finite automaton (FA) model and keywords, at least one of which is associated with a respective node of the FA model, the method comprising steps for:

(a) analyzing the video portion of the received multimedia signal to identify keyframes therein to thereby generate identified keyframes;

(b) comparing said identified keyframes within the video shot sequence with predetermined FA characteristics to identify a pattern of appearance within the video shot sequence;

(c) constructing a finite automaton (FA) model describing the appearance of the video shot sequence to thereby generate a constructed FA model;

(d) coupling neighboring video shots or similar shots with said identified keyframes when said neighboring video shots are apparently related to a story represented by said identified keyframes;

(e) extracting said keywords from said text information and storing said keywords at locations associated with each node of said constructed FA model;

(f) analyzing and segmenting the audio signal of the multimedia signal into identified speaker segments, music segments, laughter segments, and silent segments (g) attaching said identified speaker segments, music segments, laughter segments, and silent segments to said constructed FA model;

(h) when said constructed FA model matches a previously defined FA model, storing the identity of said constructed FA model as said final FA model along with said keywords; and

(i) when said constructed FA model does not match a previously defined FA model, generating a new FA model corresponding to said constructed FA model, storing said new FA model, and storing the identity of said new FA model as said final FA model along with said keywords.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A story segment retrieval device for a multimedia computer system storing a multimedia signal including a video signal, an associated audio signal and text information as a plurality of individually retrievable story segments, each having associated therewith a finite automaton (FA) model and keywords, at least one of which is associated with each respective node of the FA model. Advantageously, the story segment retrieval device includes a device for selecting a class of FA models corresponding to a desired story segment to thereby generate a selected FA model class, a device for selecting a subclass of the selected FA model class corresponding to the desired story segment to thereby generate a selected FA model subclass, a device for generating a plurality of keywords corresponding to the desired story segment, a device for sorting a set of the story segments corresponding to the selected FA model subclass using selected keyframes, keywords and query video clips to retrieve ones of the set of the story segments including the desired story segment. Multimedia signal parsing, video story segmentation, and video story categorization methods and corresponding systems, as well as storage media storing computer-readable instructions for performing these methods, are also described.

Citations

19 Claims

1. A multimedia signal parsing method for operating a multimedia computer system receiving a multimedia signal including a video shot sequence, an associated audio signal and corresponding text information to permit story segmentation of the multimedia signal into discrete stories, each of which has associated therewith a final finite automaton (FA) model and keywords, at least one of which is associated with a respective node of the FA model, the method comprising steps for:
- (a) analyzing the video portion of the received multimedia signal to identify keyframes therein to thereby generate identified keyframes;
  
  (b) comparing said identified keyframes within the video shot sequence with predetermined FA characteristics to identify a pattern of appearance within the video shot sequence;
  
  (c) constructing a finite automaton (FA) model describing the appearance of the video shot sequence to thereby generate a constructed FA model;
  
  (d) coupling neighboring video shots or similar shots with said identified keyframes when said neighboring video shots are apparently related to a story represented by said identified keyframes;
  
  (e) extracting said keywords from said text information and storing said keywords at locations associated with each node of said constructed FA model;
  
  (f) analyzing and segmenting the audio signal of the multimedia signal into identified speaker segments, music segments, laughter segments, and silent segments (g) attaching said identified speaker segments, music segments, laughter segments, and silent segments to said constructed FA model;
  
  (h) when said constructed FA model matches a previously defined FA model, storing the identity of said constructed FA model as said final FA model along with said keywords; and
  
  (i) when said constructed FA model does not match a previously defined FA model, generating a new FA model corresponding to said constructed FA model, storing said new FA model, and storing the identity of said new FA model as said final FA model along with said keywords.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The multimedia signal parsing method as recited in claim 1, wherein said step (d) further comprises:
3. The multimedia signal parsing method as recited in claim 1, wherein said method further comprises:
- (j) when it is determined that said video shot sequence does not fit the constructed FA model, realigning said video shot sequence, wherein said step (j) is performed prior to performing said step (f).
4. The multimedia signal parsing method as recited in claim 1, further comprising steps for:
- (k) determining whether it is necessary to restructure the constructed FA model to accommodate said identified speaker segments, music segments, and silent segments; and
  
  (l) when restructuring is necessary, restructure the constructed FA model;
  
  wherein said steps (k) and (l) are performed prior to performing said steps (h) and(i).
5. The multimedia signal parsing method as recited in claim 1, further comprising steps for:
- (m) determining whether said keywords generated in step (e) match user-selected keywords selected; and
  
  (n) when a match is not detected, terminating the multimedia signal parsing method.

6. A combination receiving a multimedia signal including a video shot sequence, an audio signal and text information for parsing the multimedia signal into one of a plurality of story program categories, each of the program categories having an associated finite automaton (FA) model and keywords, at least one of which keywords being associated with a respective node of the FA model, comprising:
- first means for analyzing the video portion of the received multimedia signal to identify keyframes therein to thereby generate identified keyframes;
  
  second means for comparing said identified keyframes within the video shot sequence with predetermined FA characteristics to identify a pattern of appearance within the video shot sequence;
  
  third means constructing a finite automaton (FA) model describing the appearance of the video shot sequence to thereby generate a constructed FA model;
  
  fourth means for coupling neighboring video shots or similar shots with said identified keyframes when said neighboring video shots are apparently related to a story represented by said identified keyframes;
  
  fifth means for extracting said keywords from said text information and storing said keywords at locations associated with each node of said constructed FA model;
  
  sixth means for analyzing and segmenting the audio signal in the multimedia signal into identified speaker segments, music segments, and silent segments seventh means for attaching said identified speaker segments, music segments, and silent segments to said constructed FA model;
  
  eighth means for storing the identity of said constructed FA model as said final FA model along with said keywords when said constructed FA model matches a previously defined FA model; and
  
  ninth means for generating a new FA model corresponding to said constructed FA model, for storing said new FA model, and for storing the identity of said new FA model as said final FA model along with said keywords when said constructed FA model does not match a previously defined FA model.
- View Dependent Claims (7, 8, 9, 10, 11)
- - 7. The combination as recited in claim 6, further comprising:
8. The combination as recited in claim 6, further comprising:
- twelfth means for, when it is determined that said video shot sequence does not fit the constructed FA model, realigning said video shot sequence, wherein said twelfth means is operatively coupled between said fifth means and said sixth means.
9. The combination as recited in claim 6, further comprising:
- fourteenth means for determining whether it is necessary to restructure the constructed FA model to accommodate said identified speaker segments, music segments, and silent segments; and
  
  fifteenth means for, when restructuring is necessary, restructuring the constructed FA model;
  
  wherein said fourteenth and fifteen means are serially coupled to one another and operatively coupled between said eight and ninth means.
10. The combination as recited in claim 6, further comprising:
- sixteenth means for determining whether said keywords generated by said fifth means match user-selected keywords selected; and
  
  seventeenth means for, when a match is not detected, terminating operation of the combination.
11. The method as recited in claim 6, further comprising:
- eighteenth means for extracting a plurality of keywords from an input first sentence;
  
  nineteenth means for categorizing said first sentence into one of a plurality of video story categories;
  
  twentieth means for determining whether a current video shot belongs to a previous video story category, a current video story category or a new video story category of said plurality of video story categories responsive to similarity between said first sentence and an immediately preceding sentence; and
  
  twenty-first means for operating said eighteenth through twentieth means seriatim until all video clips and respective sentences are assigned to one of said categories, wherein said eighteenth through twentieth means are serially coupled to both said eighth means and said ninth means, and wherein said eighteenth through twenty-first means are operative when said identified FA model corresponds to a predetermined one of the program categories.

12. A video story parsing method employed in the operation of a multimedia computer system receiving a multimedia signal including a video shot sequence, an associated audio signal and corresponding text information to permit a multimedia signal parsed into a predetermined category having an associated finite automaton (FA) model and keywords, at least one of the keywords being associated with a respective node of the FA model to be parsed into a number of discrete video stories, the method comprising steps for:
- (a) extracting a plurality of keywords from an input first sentence;
  
  (b) categorizing said first sentence into one of a plurality of categories;
  
  (c) determining whether a current video shot belongs to a previous category, a current category or a new category of said plurality of categories responsive to similarity between said first sentence and an immediately preceding sentence;
  
  (d) repeating steps (a) through (c) until all video clips and respective sentences are assigned to one of said categories.

13. The video story parsing method as recited in 12, wherein said step (b) comprises:
- (b) categorizing said first sentence into one of a plurality of categories by determining a measure M_kⁱof the similarity between the keywords extracted during step (a) and a keyword set for an i^thstory category Ci according to the expression set;
  
  if Memⁱ≢
  
  0, $M_{k}^{i} = (\frac{MK}{Nkeywords} + {Mem}^{i}) / 2$
  
  if Memⁱ=0, $M_{k}^{i} = \frac{MK}{Nkeywords}$ where MK denotes a number of matched words out of a total number Nkeywords of keywords in the respective keyword set for a characteristic sentence in said category Ci, where Memⁱis indicative of a measure of similarity with respect to the previous sentence sequence within category Ci and wherein 0≦
  
  M_kⁱ<
  
  1.

14. A method for operating a multimedia computer system receiving a multimedia signal including a video shot sequence, an associated audio signal and corresponding text information to thereby generate a video story database including a plurality of discrete stories searchable by one of finite automaton (FA) model having associated keywords, at least one of which keywords is associated with a respective node of the FA model, and user selected similarity criteria, the method comprising steps for:
- (a) analyzing the video portion of the received multimedia signal to identify keyframes therein to thereby generate identified keyframes;
  
  (b) comparing said identified keyframes within the video shot sequence with predetermined FA characteristics to identify a pattern of appearance within the video shot sequence;
  
  (c) constructing a finite automaton (FA) model describing the appearance of the video shot sequence to thereby generate a constructed FA model;
  
  (d) coupling neighboring video shots or similar shots with said identified keyframes when said neighboring video shots are apparently related to a story represented by said identified keyframes;
  
  (e) extracting said keywords from said text information and storing said keywords at locations associated with each node of said constructed FA model;
  
  (f) analyzing and segmenting the audio signal of the multimedia signal into identified speaker segments, music segments, laughter segments, and silent segments (g) attaching said identified speaker segments, music segments, laughter segments, and silent segments to said constructed FA model;
  
  (h) when said constructed FA model matches a previously defined FA model, storing the identity of said constructed FA model as said final FA model along with said keywords;
  
  (i) when said constructed FA model does not match a previously defined FA model, generating a new FA model corresponding to said constructed FA model, storing said new FA model, and storing the identity of said new FA model as said final FA model along with said keywords;
  
  (j) when said final FA model corresponds to a predetermined program category, performing video story segmentation according to the substeps of;
  
  (j)(i) extracting a plurality of keywords from an input first sentence;
  
  (j)(ii) categorizing said first sentence into one of a plurality of video story categories;
  
  (j)(iii) determining whether a current video shot belongs to a previous video story category, a current video story category or a new video story category of said plurality of video story categories responsive to similarity between said first sentence and an immediately preceding sentence; and
  
  (j)(iv) repeating steps (j)(i) through (j)(iii) until all video clips and respective sentences are assigned to one of said video story categories.
- View Dependent Claims (16, 17, 18, 19)
- - 16. The method as recited in claim 14, wherein said step (d) further comprises:
17. The method as recited in claim 14, wherein said method further comprises:
- (k) when it is determined that said video shot sequence does not fit the constructed FA model, realigning said video shot sequence, wherein said step (k) is performed prior to performing said step (f).
18. The multimedia signal parsing method as recited in claim 14, further comprising steps for:
- (l) determining whether it is necessary to restructure the constructed FA model to accommodate said identified speaker segments, music segments, and silent segments; and
  
  (m) when restructuring is necessary, restructure the constructed FA model;
  
  wherein said steps (l) and (m) are performed prior to performing said steps (h) and (i).
19. The method as recited in claim 14, further comprising steps for:
- (n) determining whether said keywords generated in step (e) match user-selected keywords selected; and
  
  (o) when a match is not detected, terminating the multimedia signal parsing method.

15. The method as recited in 14, wherein said substep (j)(ii) further comprises:
- (j)(ii) categorizing said first sentence into one of a plurality of sentence categories by determining a measure M_kⁱof the similarity between the keywords extracted during step (k)(i) and a keyword set for an i^thvideo story category Ci according to the expression set;
  
  if Memⁱ≢
  
  0, $M_{k}^{i} = (\frac{MK}{Nkeywords} + {Mem}^{i}) / 2$
  
  if Memⁱ=0, $M_{k}^{i} = \frac{MK}{Nkeywords}$ where MK denotes a number of matched words out of a total number Nkeywords of keywords in the respective keyword set for a characteristic sentence in said category Ci, where Memⁱis indicative of a measure of similarity with respect to the previous sentence sequence within category Ci and wherein 0≦
  
  M_kⁱ<
  
  1.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Uniloc 2017 LLC (FIG LLC (d/b/a Fortress Investment Group LLC))
Original Assignee
US Philips Corporation (Koninklijke Philips N.V.)
Inventors
Dimitrova, Nevenka
Primary Examiner(s)
AMSBURY, WAYNE P

Application Number

US09/006,657
Time in Patent Office

1,533 Days
Field of Search

707/4-6
US Class Current

707/740
CPC Class Codes

G06F 16/7834   using audio features

G06F 16/7844   using original textual cont...

G06V 20/40   in video content extracting...

G11B 2220/216   Rewritable discs

G11B 2220/2516   Hard disks

G11B 2220/2545   CDs

G11B 2220/2562   DVDs [digital versatile dis...

G11B 2220/455   said record carriers being ...

G11B 2220/90   Tape-like record carriers

G11B 27/22   Means responsive to presenc...

G11B 27/28   by using information signal...

Y10S 707/914   Video

Y10S 707/99934   Query formulation, input pr...

Y10S 707/99935   Query augmenting and refini...

Y10S 707/99936   Pattern matching access

Multimedia computer system with story segmentation capability and operating program therefor including finite automation video parser

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Multimedia computer system with story segmentation capability and operating program therefor including finite automation video parser

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links