Information Processing Device, Information Processing Method and Program

US 20130183022A1
Filed: 08/02/2011
Published: 07/18/2013
Est. Priority Date: 08/11/2010
Status: Active Grant

First Claim

Patent Images

1. An information processing device, comprising:

learning means forextracting an image feature amount of each frame of an image of learning content and extracting word frequency information regarding a frequency of appearance of each word in a description text describing a content of the image of the learning content as a text feature amount of the description text, andlearning an annotation model, which is a multi-stream HMM (hidden Markov model), by using an annotation sequence for annotation, which is a multi-stream including the image feature amount and the text feature amount; and

browsing controlling means forextracting a scene, which is a group of one or more temporally continuous frames, from target content from which the scene is to be extracted by using the annotation model, anddisplaying representative images of scenes so as to be arranged in chronological order.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention relates to an information processing device, an information processing method, and a program capable of easily adding an annotation to content and providing an application, which utilizes the annotation.

A learning device 312 extracts an image feature amount of each frame of an image of learning content and extracts word frequency information regarding a frequency of appearance of each word in a description text describing a content of the image of the learning content as a text feature amount of the description text, and learns an annotation model, which is a multi-stream HMM, by using a multi-stream including the image feature amount and the text feature amount. A browsing control device 314 extracts a scene, which is a group of one or more temporally continuous frames, from target content by using the annotation model and displays representative images of the scenes so as to be arranged in chronological order. The present invention may be applied to a case of adding the annotation to the content, for example.

40 Citations

View as Search Results

19 Claims

1. An information processing device, comprising:
- learning means forextracting an image feature amount of each frame of an image of learning content and extracting word frequency information regarding a frequency of appearance of each word in a description text describing a content of the image of the learning content as a text feature amount of the description text, andlearning an annotation model, which is a multi-stream HMM (hidden Markov model), by using an annotation sequence for annotation, which is a multi-stream including the image feature amount and the text feature amount; and
  
  browsing controlling means forextracting a scene, which is a group of one or more temporally continuous frames, from target content from which the scene is to be extracted by using the annotation model, anddisplaying representative images of scenes so as to be arranged in chronological order.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
- - 2. The information processing device according to claim 1, whereinthe learning content includes a text of a caption, andthe description text is the text of the caption included in the learning content.
  - 3. The information processing device according to claim 2, whereinthe learning meansextracts words included in the text of the caption displayed in a window as one document while shifting the window of a predetermined time length at regular intervals, andextracts multinomial distribution, which represents a frequency of appearance of each word in the document, as the text feature amount.
  - 4. The information processing device according to claim 2, whereinthe learning meansextracts words included in the text of the caption displayed in a window as one document while shifting the window of a predetermined time length at regular intervals, andextracts multinomial distribution, which represents a frequency of appearance of each word in the document, as the text feature amount, andthe browsing controlling meansextracts the image feature amount of each frame of the image of the target content and composes the annotation sequence by using the image feature amount,obtains a maximum likelihood state sequence in which the annotation sequence is observed in the annotation model,selects a word with high frequency in the multinomial distribution observed in a state corresponding to a noted frame of interest out of states of the maximum likelihood state sequence as an annotation to be added to the frame of interest,extracts a group of one or more temporally continuous frames to which the same annotation is added as the scene from the target content, anddisplays the representative images of the scenes so as to be arranged in chronological order.
  - 5. The information processing device according to claim 4, whereinthe target content is content of a broadcast program, andthe browsing controlling means displays the representative images of the scenes of the broadcast program so as to be arranged in chronological order in a program listing of the broadcast program on an EPG (electronic program guide).
  - 6. The information processing device according to claim 4, whereinthe browsing controlling means also displays the annotation added to the frame, which composes the scene, together with a representative image of the scene.
  - 7. The information processing device according to claim 2, whereinthe learning meansextracts words included in the text of the caption displayed in a window as one document while shifting the window of a predetermined time length at regular intervals, andextracts multinomial distribution, which represents a frequency of appearance of each word in the document, as the text feature amount, andthe browsing controlling meansextracts the image feature amount of each frame of the image of the target content and composes the annotation sequence by using the image feature amount,obtains a maximum likelihood state sequence in which the annotation sequence is observed in the annotation model,selects, when a frequency of a predetermined keyword is high in the multinomial distribution observed in a state corresponding to a noted frame of interest out of states of the maximum likelihood state sequence, the frame of interest as a keyword frame, which is the frame of which content coincides with a predetermined keyword,extracts a group of one or more temporally continuous frames as the scene from the keyword frame, anddisplays the representative images of the scenes so as to be arranged in chronological order.
  - 8. The information processing device according to claim 7, whereinthe target content is content of a broadcast program, andthe browsing controlling means displays the representative images of the scenes of the broadcast program so as to be arranged in chronological order in a program listing of the broadcast program on an EPG (electronic program guide).
  - 9. The information processing device according to claim 2, whereinthe learning meansperforms dimension reduction to reduce a dimension of the image feature amount and the text feature amount, andlearns the annotation model by using the multi-stream including the image feature amount and the text feature amount after the dimension reduction as the annotation sequence.
  - 10. The information processing device according to claim 9, wherein the learning meansobtains basis space data of a basis space for image of which dimension is lower than the dimension of the image feature amount for mapping the image feature amount by using the image feature amount,performs the dimension reduction of the image feature amount based on the basis space data of the basis space for image,obtains basis space data of a basis space for text of which dimension is lower than the dimension of the text feature amount for mapping the text feature amount by using the text feature amount, andperforms the dimension reduction of the text feature amount based on the basis space data of the basis space for text.
  - 11. The information processing device according to claim 10, whereinthe learning meansobtains a code book used for vector quantization as the basis space data of the basis space for image by using the image feature amount, andobtains a code representing a centroid vector as the image feature amount after the dimension reduction by performing the vector quantization of the image feature amount by using the code book.
  - 12. The information processing device according to claim 10, whereinthe learning meansextracts words included in the text of the caption displayed in a window as one document while shifting the window of a predetermined time length at regular intervals,extracts a frequency of appearance of each word in the document as the text feature amount,obtains a parameter of LDA (latent Dirichlet allocation) as the basis space data of the basis space for text by learning the LDA by using the document obtained from the learning content, andconverts the text feature amount obtained from the document to topic likelihood, which is likelihood of each latent topic of the LDA for the document, by using the parameter of the LDA, to obtain a topic label representing the latent topic of which topic likelihood is the maximum as the text feature amount after the dimension reduction.
  - 13. The information processing device according to claim 12, whereinthe learning meansgenerates a word dictionary of the words appearing in the document by using the document obtained from the learning content and creates a topic-to-frequently appearing word table of a word with high appearance frequency in the latent topic of the LDA and the appearance frequency of the word by using occurrence probability of occurrence of each word in the word dictionary in each latent topic of the LDA obtained by learning the LDA, andthe browsing controlling meansextracts the image feature amount of each frame of the image of the target content, performs the dimension reduction, and composes the annotation sequence by using the image feature amount after the dimension reduction,obtains a maximum likelihood state sequence in which the annotation sequence is observed in the annotation model,selects the latent topic represented by the topic label with high output probability in a state corresponding to a noted frame of interest out of states of the maximum likelihood state sequence as a frame topic representing a content of the frame of interest,selects a word with high appearance frequency in the frame topic as the annotation to be added to the frame of interest based on the topic-to-frequently appearing word table,extracts a group of one or more temporally continuous frames to which the same annotation is added as the scene from the target content, anddisplays the representative images of the scenes so as to be arranged in chronological order.
  - 14. The information processing device according to claim 13, whereinthe target content is content of a broadcast program, andthe browsing controlling means displays the representative images of the scenes of the broadcast program so as to be arranged in a program listing of the broadcast program on an EPG (electronic program guide) in chronological order.
  - 15. The information processing device according to claim 13, whereinthe browsing controlling means also displays the annotation added to the frame, which composes the scene, together with a representative image of the scene.
  - 16. The information processing device according to claim 12, whereinthe learning meansgenerates a word dictionary of the words appearing in the document by using the document obtained from the learning content and creates a topic-to-frequently appearing word table of a word with high appearance frequency in the latent topic of the LDA and the appearance frequency of the word by using occurrence probability of occurrence of each word in the word dictionary in each latent topic of the LDA obtained by learning the LDA, andthe browsing controlling meansextracts the image feature amount of each frame of the image of the target content, performs the dimension reduction, and composes the annotation sequence by using the image feature amount after the dimension reduction,obtains a maximum likelihood state sequence in which the annotation sequence is observed in the annotation model,selects the latent topic represented by the topic label with high output probability in a state corresponding to a noted frame of interest out of states of the maximum likelihood state sequence as a frame topic representing a content of the frame of interest,obtains an appearance frequency of a predetermined keyword in the frame topic based on the topic-to-frequently appearing word table and selects, when the appearance frequency of the predetermined keyword is high, the frame of interest as a keyword frame, which is a frame of which content coincides with the predetermined keyword,extracts a group of one or more temporally continuous frames from the keyword frame as the scene, anddisplays the representative images of the scenes so as to be arranged in chronological order.
  - 17. The information processing device according to claim 16, whereinthe target content is content of a broadcast program, andthe browsing controlling means displays the representative images of the scenes of the broadcast program so as to be arranged in chronological order in a program listing of the broadcast program on an EPG (electronic program guide).

18. An information processing method to be performed by an information processing device, comprising the steps of:
- extracting an image feature amount of each frame of an image of learning content and extracting word frequency information regarding a frequency of appearance of each word in a description text describing a content of the image of the learning content as a text feature amount of the description text;
  
  learning an annotation model, which is a multi-stream HMM (hidden Markov model), by using an annotation sequence for annotation, which is a multi-stream including the image feature amount and the text feature amount;
  
  extracting a scene, which is a group of one or more temporally continuous frames, from target content from which the scene is to be extracted by using the annotation model; and
  
  displaying representative images of scenes so as to be arranged in chronological order.

19. A program for allowing a computer to function as:
- learning means forextracting an image feature amount of each frame of an image of learning content and extracting word frequency information regarding a frequency of appearance of each word in a description text describing a content of the image of the learning content as a text feature amount of the description text, andlearning an annotation model, which is a multi-stream HMM (hidden Markov model), by using an annotation sequence for annotation, which is a multi-stream including the image feature amount and the text feature amount; and
  
  browsing controlling means forextracting a scene, which is a group of one or more temporally continuous frames, from target content from which the scene is to be extracted by using the annotation model, anddisplaying representative images of scenes so as to be arranged in chronological order.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sony Corporation (Sony Group Corp.)
Original Assignee
Sony Corporation (Sony Group Corp.)
Inventors
Suzuki, Hirotaka, Ito, Masato

Granted Patent

US 9,232,205 B2
Time in Patent Office

Days
Field of Search
US Class Current

386/241
CPC Class Codes

G06F 16/739   in form of a video summary,...

G06F 16/745   the internal structure of a...

G06F 16/7844   using original textual cont...

G06V 20/41   Higher-level, semantic clus...

G06V 2201/10   Recognition assisted with m...

H04N 9/87   Regeneration of colour tele...

Information Processing Device, Information Processing Method and Program

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

40 Citations

19 Claims

Specification

Use Cases

Quick Links

Others

Information Processing Device, Information Processing Method and Program

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

40 Citations

19 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others