×

Jointly modeling embedding and translation to bridge video and language

  • US 9,807,473 B2
  • Filed: 11/20/2015
  • Issued: 10/31/2017
  • Est. Priority Date: 11/20/2015
  • Status: Active Grant
First Claim
Patent Images

1. An apparatus comprising:

  • a processor; and

    a computer-readable medium storing modules of instructions that, when executed by the processor, configure the apparatus to perform video description generation, the modules comprising;

    a training module to configure the processor to train a neural network, a video content transformation matrix, and a semantics transformation matrix based at least in part on a plurality of video/descriptive text pairs, a coherence loss threshold, and a relevance loss threshold, the training module further configured to adjust one or more parameters associated with the semantics transformation matrix in response to an energy value being applied to a recurrent neural network;

    a video description module to configure the processor to generate a textual description for an inputted video based at least in part on information associated with the inputted video, the neural network, the video content transformation matrix and the semantics transformation matrix; and

    an output module to configure the processor to generate an output based at least in part on the textual description for the inputted video.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×