System and method for direct multi-modal annotation of objects

US 7,493,559 B1
Filed: 01/09/2002
Issued: 02/17/2009
Est. Priority Date: 01/09/2002
Status: Expired due to Fees

First Claim

Patent Images

1. An apparatus for direct annotation of objects, the apparatus comprising:

a display device for displaying one or more images;

an audio input device for receiving an audio signal;

a storage device for storing a plurality of different visual notations each comprising a text or a graphic image and for storing a plurality of corresponding audio signals;

a direct annotation creation module coupled to receive the audio signal from the audio input device and to receive a reference to a location within an image on the display device, the direct annotation creation module, in response to receiving the audio signal and the reference to the location within the image, automatically creating an annotation object, independent from the image, that associates the input audio signal, the location and one of the plurality of different visual notations; and

an audio vocabulary comparison module coupled to the audio input device, the storage device and the direct annotation creation module, the audio vocabulary comparison module receiving audio input and finding a corresponding one of the plurality of different visual notations that matches content of the audio input.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The system includes an image display system, a direct annotation creation module, an annotation display module, a vocabulary comparison module and a dynamic updating module. These modules are coupled together by a bus and provide for the direct multi-modal annotation of media of media objects. The direct annotation creation module creates annotation objects. The annotation display module works in cooperation with the image display system to display the annotations themselves or graphic representations of the annotation positioned relative to the images of the objects. The system automatically creates the annotation, associates it with the selected images, and displays either a graphic representation of the annotation or a text translation of the audio input.

Citations

26 Claims

1. An apparatus for direct annotation of objects, the apparatus comprising:
- a display device for displaying one or more images;
  
  an audio input device for receiving an audio signal;
  
  a storage device for storing a plurality of different visual notations each comprising a text or a graphic image and for storing a plurality of corresponding audio signals;
  
  a direct annotation creation module coupled to receive the audio signal from the audio input device and to receive a reference to a location within an image on the display device, the direct annotation creation module, in response to receiving the audio signal and the reference to the location within the image, automatically creating an annotation object, independent from the image, that associates the input audio signal, the location and one of the plurality of different visual notations; and
  
  an audio vocabulary comparison module coupled to the audio input device, the storage device and the direct annotation creation module, the audio vocabulary comparison module receiving audio input and finding a corresponding one of the plurality of different visual notations that matches content of the audio input.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The apparatus of claim 1 further comprising an annotation display module coupled to the direct annotation creation module, the annotation display module generating symbols or text representing the annotation objects.
  - 3. The apparatus of claim 1 further comprising an annotation audio output module coupled to the direct annotation creation module, the annotation audio output module generating audio output in response to user selection of an annotation symbol representing an annotation object.
  - 4. The apparatus of claim 1 further comprising:
    - an audio vocabulary storage for storing a plurality of audio signals and corresponding text strings;
      
      a dynamic vocabulary updating module coupled to the audio vocabulary storage and the audio input device, the dynamic vocabulary updating module for displaying an interface to create a new entry in the audio vocabulary storage, the dynamic vocabulary updating module receiving an audio input and a text string and creating the new entry in the audio vocabulary storage that includes a new visual annotation.
  - 5. The apparatus of claim 1 further comprising a media object cache for storing media and annotation objects.

6. A computer program product having a computer-readable storage medium storing computer-executable code for direct annotation of objects, the code comprising:
- a media object storage for storing media, annotation objects, a plurality of different visual notations each comprising a text or a graphic image and a plurality of corresponding audio signals;
  
  a direct annotation creation module coupled to receive an audio signal, a selected visual notation from the plurality of different visual notations and a reference to a location within an image, the direct annotation creation module, in response to receiving the audio signal or the reference to the location within the image, automatically creating an annotation object, independent of the image, that associates the audio signal, the selected visual notation and the location, and the direct annotation creation module storing the audio annotation in the media object storage;
  
  an audio vocabulary comparison module coupled to the media object storage and the direct annotation creation module, the audio vocabulary comparison module receiving audio input and finding a corresponding one of the plurality of different visual notations that matches content of the audio input; and
  
  an annotation output module coupled to the direct annotation creation module, the annotation output module generating audio or visual output in response to user selection of an annotation symbol representing the annotation object.

7. A computer implemented method for direct annotation of objects, the method comprising the steps of:
- displaying an image;
  
  receiving audio input;
  
  detecting selection of a location within the image;
  
  comparing the audio input to a vocabulary;
  
  finding a corresponding one of a plurality of different visual notations that matches content of the audio input; and
  
  creating an annotation object, independent of the selected image, that provides an association between the image, the audio input, the selected location, the found corresponding one of a plurality of different visual notations comprising text or a graphic image, the annotation object including at least a text annotation field, an image reference field, and an annotation location field, the creating step occurring automatically in response to the receiving or detecting.
- View Dependent Claims (8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
- - 8. The method of claim 7, wherein the step of displaying is performed before or simultaneously with the step of receiving.
  - 9. The method of claim 7, wherein the step of receiving is performed before or simultaneously with the step of displaying.
  - 10. The method of claim 7, further comprising the step of displaying the one of the plurality of different visual notations to indicate that the image has an annotation.
  - 11. The method of claim 7, wherein the step of creating an annotation object includes storing the annotation object in an object storage.
  - 12. The method of claim 11, further comprising the step of recording the audio input received.
  - 13. The method of claim 12, wherein the step of creating the annotation object includes creating an annotation object including a reference to the selected location, the recorded audio input and one of the plurality of different visual annotations, and storing the annotation object in an object storage.
  - 14. The method of claim 11, wherein the step of creating an annotation object includes storing the text as part of the annotation object.
  - 15. The method of claim 11, further comprising the steps of:
    - determining if the audio input has a matching entry in the vocabulary; and
      
      storing the entry as part of the annotation object if the audio input has a matching entry in the vocabulary.
  - 16. The method of claim 15, further comprising the steps of:
    - determining if the audio input has a close match in the vocabulary;
      
      displaying the close matches;
      
      receiving input selecting a close match; and
      
      storing the selected close match as part of the annotation object if the audio input has a close match in the vocabulary.
  - 17. The method of claim 16, further comprising the step of displaying a message that the image has not been annotated if there is neither a matching entry in the vocabulary nor a close match in the vocabulary.
  - 18. The method of claim 16, further comprising the following steps if there is neither a matching entry in the vocabulary nor a close match in the vocabulary:
    - receiving text input corresponding to the audio input;
      
      updating the vocabulary with a new entry including the audio input and the text input; and
      
      wherein the received text is stored as part of the annotation object.
  - 19. The method of claim 11, further comprising the steps of:
    - receiving text input corresponding to the audio input;
      
      updating the vocabulary with a new entry including the audio input and the text input.

20. A computer implemented method for displaying objects with annotations, the method comprising the steps of:
- receiving audio input;
  
  finding a corresponding annotation object comprising one of a plurality of different visual notations, the plurality of different visual notations referencing a close match to content of the audio input;
  
  retrieving an image associated with the corresponding annotation object;
  
  displaying the image with one of the plurality of different visual notations to indicate that an annotation exists;
  
  receiving user selection of the one visual notation;
  
  generating the annotation automatically, in response to user input of a location within the image and an audio input;
  
  outputting the annotation associated with the selected visual notation;
  
  determining whether the annotation includes text;
  
  retrieving a text annotation for the selected visual notation; and
  
  displaying the retrieved text with the image.
- View Dependent Claims (21, 22, 23)
- - 21. The method of claim 20, wherein the annotation is text and the step of outputting is displaying the text proximate the image that it annotates.
  - 22. The method of claim 20, wherein the annotation is an audio signal and the step of outputting is playing the audio signal.
  - 23. The method of claim 20, further comprising the steps of:
    - determining whether the annotation includes an audio signal;
      
      retrieving an audio signal for the selected visual annotation; and
      
      wherein the step of outputting is playing the audio signal.

24. A computer implemented method for retrieving images, the method comprising the steps of:
- receiving audio input;
  
  finding corresponding annotation objects comprising one of a plurality of different visual notations, the plurality of different visual notations referencing a close match to content of the audio input, each corresponding annotation object generated automatically in response to user input of a location within an image and an audio signal, where a recording of the audio signal is terminated automatically based on a predetermined audio level;
  
  retrieving the images that are referenced by the found annotation objects; and
  
  displaying the retrieved images, the plurality of different visual notations for the found corresponding annotation objects and wherein each of the found corresponding annotation objects include at least an audio input field, an image reference field, and an annotation location field.
- View Dependent Claims (25, 26)
- - 25. The method of claim 24, wherein the step of determining annotation objects further comprises the steps of:
    - comparing the audio input to an audio signal reference of the annotation object; and
      
      determining a close match between the audio input and the audio signal reference of the annotation object if a probability metric is greater than a threshold of 80%.
  - 26. The method of claim 24, wherein the step of determining annotation objects further comprises the steps of:
    - determining the annotation objects for a plurality of images;
      
      for each annotation object, comparing the audio input to an audio signal reference of the annotation object; and
      
      determining a close match between the audio input and the audio signal reference of the annotation object if a probability metric is greater than an a threshold of 80%.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Ricoh Company Limited
Original Assignee
Ricoh Company Limited
Inventors
Wolff, Gregory J., Hart, Peter E.
Primary Examiner(s)
Hong; Stephen
Assistant Examiner(s)
Pitaro; Ryan F

Application Number

US10/043,575
Time in Patent Office

2,596 Days
Field of Search

715/727, 715/728, 715/716, 715/512, 715/763, 715/860, 715/838
US Class Current

715/727
CPC Class Codes

G06F 40/169 Annotation, e.g. comment da...

System and method for direct multi-modal annotation of objects

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

26 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for direct multi-modal annotation of objects

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

26 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links