Information Processing Device, Information Processing Method and Program
First Claim
1. An information processing device, comprising:
- learning means forextracting an image feature amount of each frame of an image of learning content and extracting word frequency information regarding a frequency of appearance of each word in a description text describing a content of the image of the learning content as a text feature amount of the description text, andlearning an annotation model, which is a multi-stream HMM (hidden Markov model), by using an annotation sequence for annotation, which is a multi-stream including the image feature amount and the text feature amount; and
browsing controlling means forextracting a scene, which is a group of one or more temporally continuous frames, from target content from which the scene is to be extracted by using the annotation model, anddisplaying representative images of scenes so as to be arranged in chronological order.
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention relates to an information processing device, an information processing method, and a program capable of easily adding an annotation to content and providing an application, which utilizes the annotation.
A learning device 312 extracts an image feature amount of each frame of an image of learning content and extracts word frequency information regarding a frequency of appearance of each word in a description text describing a content of the image of the learning content as a text feature amount of the description text, and learns an annotation model, which is a multi-stream HMM, by using a multi-stream including the image feature amount and the text feature amount. A browsing control device 314 extracts a scene, which is a group of one or more temporally continuous frames, from target content by using the annotation model and displays representative images of the scenes so as to be arranged in chronological order. The present invention may be applied to a case of adding the annotation to the content, for example.
40 Citations
19 Claims
-
1. An information processing device, comprising:
-
learning means for extracting an image feature amount of each frame of an image of learning content and extracting word frequency information regarding a frequency of appearance of each word in a description text describing a content of the image of the learning content as a text feature amount of the description text, and learning an annotation model, which is a multi-stream HMM (hidden Markov model), by using an annotation sequence for annotation, which is a multi-stream including the image feature amount and the text feature amount; and browsing controlling means for extracting a scene, which is a group of one or more temporally continuous frames, from target content from which the scene is to be extracted by using the annotation model, and displaying representative images of scenes so as to be arranged in chronological order. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. An information processing method to be performed by an information processing device, comprising the steps of:
-
extracting an image feature amount of each frame of an image of learning content and extracting word frequency information regarding a frequency of appearance of each word in a description text describing a content of the image of the learning content as a text feature amount of the description text; learning an annotation model, which is a multi-stream HMM (hidden Markov model), by using an annotation sequence for annotation, which is a multi-stream including the image feature amount and the text feature amount; extracting a scene, which is a group of one or more temporally continuous frames, from target content from which the scene is to be extracted by using the annotation model; and displaying representative images of scenes so as to be arranged in chronological order.
-
-
19. A program for allowing a computer to function as:
-
learning means for extracting an image feature amount of each frame of an image of learning content and extracting word frequency information regarding a frequency of appearance of each word in a description text describing a content of the image of the learning content as a text feature amount of the description text, and learning an annotation model, which is a multi-stream HMM (hidden Markov model), by using an annotation sequence for annotation, which is a multi-stream including the image feature amount and the text feature amount; and browsing controlling means for extracting a scene, which is a group of one or more temporally continuous frames, from target content from which the scene is to be extracted by using the annotation model, and displaying representative images of scenes so as to be arranged in chronological order.
-
Specification