Method and system for segmenting and identifying events in images using spoken annotations

US 7,120,586 B2
Filed: 07/21/2004
Issued: 10/10/2006
Est. Priority Date: 06/01/2001
Status: Expired due to Term

First Claim

Patent Images

1. A method for automatically organizing digitized photographic images into events based on spoken annotations, where the events are useful in organizing photographic albums, said method comprising the steps of:

providing natural-language text based on spoken annotations corresponding to a plurality of frames of photographic images;

extracting predetermined information from the natural-language text that characterizes the annotations of the images;

segmenting the images into events by examining each annotation for the presence of certain categories of information which are indicative of a boundary between events; and

identifying each event by assembling the categories of information into event descriptions,wherein the step of segmenting the images into events comprises the steps of;

assigning a strength value for the certain categories of information which are indicative of a boundary between events;

computing the evidence in favor of and against an event break with regard to a current frame by summing the strength values from the certain categories of information present for the current frame relative to a preceding frame already allocated to a current event; and

allocating the frame to a new event when the summarized strength values in favor of an event break exceed a predetermined threshold, otherwise allocating the frame to the current event.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for automatically organizing digitized photographic images into events based on spoken annotations comprises the steps of: providing natural-language text based on spoken annotations corresponding to at least some of the photographic images; extracting predetermined information from the natural-language text that characterizes the annotations of the images; segmenting the images into events by examining each annotation for the presence of certain categories of information which are indicative of a boundary between events; and identifying each event by assembling the categories of information into event descriptions. The invention further comprises the step of summarizing each event by selecting and arranging the event descriptions in a suitable manner, such as in a photographic album.

Citations

23 Claims

1. A method for automatically organizing digitized photographic images into events based on spoken annotations, where the events are useful in organizing photographic albums, said method comprising the steps of:
- providing natural-language text based on spoken annotations corresponding to a plurality of frames of photographic images;
  
  extracting predetermined information from the natural-language text that characterizes the annotations of the images;
  
  segmenting the images into events by examining each annotation for the presence of certain categories of information which are indicative of a boundary between events; and
  
  identifying each event by assembling the categories of information into event descriptions,wherein the step of segmenting the images into events comprises the steps of;
  
  assigning a strength value for the certain categories of information which are indicative of a boundary between events;
  
  computing the evidence in favor of and against an event break with regard to a current frame by summing the strength values from the certain categories of information present for the current frame relative to a preceding frame already allocated to a current event; and
  
  allocating the frame to a new event when the summarized strength values in favor of an event break exceed a predetermined threshold, otherwise allocating the frame to the current event.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The method as claimed in claim 1 further comprising the step of summarizing each event by selecting and arranging the event descriptions in a suitable manner.
  - 3. The method as claimed in claim 2 wherein the event descriptions are selected and arranged in a suitable manner as a photographic album.
  - 4. The method as claimed in claim 1 wherein the step of providing natural-language text comprises the steps of:
    - acquiring spoken annotations; and
      
      converting the spoken annotations to natural-language text.
  - 5. The method as claimed in claim 1 wherein the step of extracting predetermined information from the natural-language text comprises the steps of:
    - segmenting the natural-language text into words and sentences; and
      
      applying a plurality of dictionaries or gazetteers to the words and sentences to classify important words signifying possible events.
  - 6. The method as claimed in claim 1 wherein the step of extracting predetermined information from the natural-language text comprises the steps of:
    - segmenting the natural-language text into words and sentences;
      
      identifying elements of numerical expression that may help to define events; and
      
      identifying expressions signifying at least one of date, time, money and percentage that may further define events.
  - 7. The method as claimed in claim 1 wherein the step of extracting predetermined information from the natural-language text comprises the steps of:
    - segmenting the natural-language text into words and sentences; and
      
      identifying references to people, location, events and objects of interest in relation to possible events.
  - 8. The method as claimed in claim 1 wherein the step of extracting predetermined information from the natural-language text comprises the steps of:
    - segmenting the natural-language text into words and sentences; and
      
      identifying noun and verb phrases that may relate to possible events.
  - 9. The method as claimed in claim 1 wherein the step of extracting predetermined information from the natural-language text comprises the step of extracting the natural-language text according to an XML specification.
  - 10. The method as claimed in claim 1 wherein the steps of computing the evidence and allocating the frame are taken with regard to an adjacent frame of the current frame.
  - 11. The method as claimed in claim 1 wherein the steps of computing the evidence and allocating the frame are taken with regard to a non-adjacent frame of the current frame, and wherein the allocation of the intervening frames are made on the basis of the current frame.
  - 12. The method as claimed in claim 11 wherein the steps of computing the evidence and allocating the frame are taken with regard to a frame that is separated by one frame from the current frame.
  - 13. The method of claim 1 wherein the step of extracting predetermined information from the natural-language text comprises:
    - segmenting the natural-language text into words and sentences; and
      
      applying a gazetteer to the words and sentences to classify important words signifying possible events, said gazetteer comprising a collection of indices including commonly-used proper names, place names including typical tourist spots and celebration places, currency names, function or stop words, irregular verb forms, regular verbs, college and university names and typical events.

14. A computer program product for automatically organizing digitized photographic images into events based on spoken annotations, where the events are useful in organizing photographic albums, said computer program product comprising a computer readable storage medium having a computer program stored thereon for performing the steps of:
- providing natural-language text based on spoken annotations corresponding to at least some of the photographic images;
  
  extracting predetermined information from the natural-language text that characterizes the annotations of the images;
  
  segmenting the images into events by examining each annotation for the presence of certain categories of information which are indicative of a boundary between events; and
  
  identifying each event by assembling the categories of information into event descriptions;
  
  wherein the step of segmenting the images into events comprises the steps of;
  
  assigning a strength value for the certain categories of information which are indicative of a boundary between events;
  
  computing the evidence in favor of and against an event break with regard to a current frame by summing the strength values from the certain categories of information present for the current frame relative to a preceding frame already allocated to a current event; and
  
  allocating the frame to a new event when the summarized strength values in favor of an event break exceed a predetermined threshold, otherwise allocating the frame to the current event.
- View Dependent Claims (15)
- - 15. The computer program product as claimed in claim 14 further comprising the step of summarizing each event by selecting and arranging the event descriptions in a suitable manner.

16. A system for automatically organizing digitized photographic images into events based on spoken annotations, where the events are useful in organizing photographic albums, said system comprising:
- an input for receiving natural-language text based on spoken annotations corresponding to a plurality of frames of photographic images;
  
  an event extraction stage for extracting predetermined information from the natural-language text that characterizes the annotations of the images;
  
  an event segmentation stage for segmenting the images into events by examining each annotation for the presence of certain categories of information which are indicative of a boundary between events; and
  
  an event identification stage for identifying each event by assembling the categories of information into event descriptions,wherein the event segmentation stage comprises a processor that;
  
  assigns a strength value for the certain categories of information which are indicative of a boundary between events;
  
  computes the evidence in favor of and against an event break with regard to a current frame by summing the strength values from the certain categories of information present for the current frame relative to a preceding frame already allocated to a current event; and
  
  allocates the frame to a new event when the summarized strength values in favor of an event break exceed a predetermined threshold, otherwise allocating the frame to the current event.
- View Dependent Claims (17, 18, 19, 20, 21)
- - 17. The system as claimed in claim 16 further comprising an event summarization stage for summarizing each event by selecting and arranging the event descriptions in a suitable manner.
  - 18. The system as claimed in claim 16 wherein the event extraction stage comprises a natural language processor.
  - 19. The system as claimed in claim 18 wherein the natural language processor comprises a plurality of finite state machines.
  - 20. The system as claimed in claim 16 wherein the processor computes the evidence and allocates the frame with regard to an adjacent frame of the current frame.
  - 21. The system as claimed in claim 20 wherein the processor computes the evidence and allocates the frame with regard to a non-adjacent frame of the current frame, and wherein the allocation of the intervening frames are made on the basis of the current frame.

22. A method for automatically organizing digitized photographic images into events, said method comprising the steps of:
- providing spoken annotations corresponding to a plurality of frames of photographic images;
  
  segmenting the images into events by examining each annotation for the presence of certain categories of information which are indicative of a boundary between events; and
  
  identifying each event by assembling the categories of information into event descriptions,wherein the step of segmenting the images into events comprises the steps of;
  
  assigning a strength value for the certain categories of information which are indicative of a boundary between events;
  
  computing the evidence in favor of and against an event break with regard to a current frame by summing the strength values from the certain categories of information present for the current frame relative to a preceding frame already allocated to a current event; and
  
  allocating the frame to a new event when the summarized strength values in favor of an event break exceed a predetermined threshold, otherwise allocating the frame to the current event.

23. A system for automatically organizing digitized photographic images into events based on spoken annotations, where the events are useful in organizing photographic albums, said system comprising:
- an input for receiving spoken annotations corresponding to a plurality of frames of photographic images;
  
  an event segmentation stage for segmenting the images into events by examining each annotation for the presence of certain categories of information which are indicative of a boundary between events; and
  
  an event identification stage for identifying each event by assembling the categories of information into event descriptions,wherein the event segmentation stage comprises a processor that;
  
  assigns a strength value for the certain categories of information which are indicative of a boundary between events;
  
  computes the evidence in favor of and against an event break with regard to a current frame by summing the strength values from the certain categories of information present for the current frame relative to a preceding frame already allocated to a current event; and
  
  allocates the frame to a new event when the summarized strength values in favor of an event break exceed a predetermined threshold, otherwise allocating the frame to the current event.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Apple Inc.
Original Assignee
Eastman Kodak Company
Inventors
Stent, Amanda J., Loui, Alexander C.
Primary Examiner(s)
ABEBE, DANIEL DEMELASH

Application Number

US10/895,507
Publication Number

US 20040260558A1
Time in Patent Office

811 Days
Field of Search

704/275, 396/300
US Class Current

704/275
CPC Class Codes

G06F 16/58 Retrieval characterised by ...

Method and system for segmenting and identifying events in images using spoken annotations

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

23 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for segmenting and identifying events in images using spoken annotations

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

23 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links