Information processing device, information processing method and program
First Claim
1. An information processing device, comprising:
- one or more processors configured to;
extract an image feature amount of each frame of an image of learning content;
extract word frequency information regarding frequency of appearance of each word in a description text describing a content of the image of the learning content as a text feature amount of the description text;
learn an annotation model, which is a multi-stream HMM (hidden Markov model), by using an annotation sequence for annotation, which is a multi-stream including the image feature amount and the text feature amount andobtain an inter-state distance from one state to another state of the annotation model such that an error is minimized between i) the inter-state distance and ii) a Euclidean distance from the one state to the another state on a model map on which states of the annotation model are arranged.
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention relates to an information processing device, an information processing method, and a program capable of easily adding an annotation to content.
A feature amount extracting unit 21 extracts an image feature amount of each frame of an image of learning content and extracts word frequency information regarding frequency of appearance of each word in a description text describing a content of the image of the learning content (for example, a text of a caption) as a text feature amount of the description text. A model learning unit 22 learns an annotation model, which is a multi-stream HMM, by using an annotation sequence for annotation, which is a multi-stream including the image feature amount of each frame and the text feature amount. The present invention may be applied when adding the annotation to the content such as a television broadcast program, for example.
-
Citations
20 Claims
-
1. An information processing device, comprising:
-
one or more processors configured to; extract an image feature amount of each frame of an image of learning content; extract word frequency information regarding frequency of appearance of each word in a description text describing a content of the image of the learning content as a text feature amount of the description text; learn an annotation model, which is a multi-stream HMM (hidden Markov model), by using an annotation sequence for annotation, which is a multi-stream including the image feature amount and the text feature amount and obtain an inter-state distance from one state to another state of the annotation model such that an error is minimized between i) the inter-state distance and ii) a Euclidean distance from the one state to the another state on a model map on which states of the annotation model are arranged. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. An information processing method to be performed by an information processing device, the information processing method comprising:
-
extracting an image feature amount of each frame of an image of learning content; extracting word frequency information regarding frequency of appearance of each word in a description text describing a content of the image of the learning content as a text feature amount of the description text; learning an annotation model, which is a multi-stream HMM (hidden Markov model), by using an annotation sequence for annotation, which is a multi-stream including the image feature amount and the text feature amount; and obtaining an inter-state distance from one state to another state of the annotation model such that an error is minimized between i) the inter-state distance and ii) a Euclidean distance from the one state to the another state on a model map on which states of the annotation model are arranged.
-
-
20. A non-transitory computer-readable medium having stored thereon, a set of computer-executable instructions for causing a computer to perform steps comprising:
-
extracting an image feature amount of each frame of an image of learning content; extracting word frequency information regarding frequency of appearance of each word in a description text describing a content of the image of the learning content as a text feature amount of the description text; learning an annotation model, which is a multi-stream HMM (hidden Markov model), by using an annotation sequence for annotation, which is a multi-stream including the image feature amount and the text feature amount; and obtaining an inter-state distance from one state to another state of the annotation model such that an error is minimized between i) the inter-state distance and ii) a Euclidean distance from the one state to the another state on a model map on which states of the annotation model are arranged.
-
Specification