Sentinel gate for modulating auxiliary information in a long short-term memory (LSTM) neural network
First Claim
1. A recurrent neural network system (RNN) running on numerous parallel processors, comprising:
- a sentinel long short-term memory network (Sn-LSTM) that;
comprises a memory cell, an input gate, a forget gate, an output gate, and an auxiliary sentinel gate;
receives inputs at each of a plurality of timesteps, the inputs including at least;
an input for a current timestep,a hidden state from a previous timestep, andan auxiliary input for the current timestep;
stores in the memory cell auxiliary information accumulated over time from processing of the inputs by the input gate, the forget gate, and the output gate;
updates the memory cell with gate outputs produced by the input gate, the forget gate, and the output gate;
generates, using the output gate, a hidden state as a first output of the Sn-LSTM based on the input, for the current timestep, the hidden state from the previous timestep, and information in the memory cell; and
generates, using the auxiliary sentinel gate, a sentinel state as a second output of the Sn-LSTM different from the first output based on the auxiliary input for the current timestep, the hidden state from the previous timestep, and the information in the memory cell;
wherein the auxiliary sentinel gate modulates the stored auxiliary information from the memory cell for a next prediction, with the modulation conditioned on the input for the current timestep, the hidden state from the previous timestep, and the auxiliary input for the current timestep.
1 Assignment
0 Petitions
Accused Products
Abstract
The technology disclosed presents a novel spatial attention model that uses current hidden state information of a decoder long short-term memory (LSTM) to guide attention and to extract spatial image features for use in image captioning. The technology disclosed also presents a novel adaptive attention model for image captioning that mixes visual information from a convolutional neural network (CNN) and linguistic information from an LSTM. At each timestep, the adaptive attention model automatically decides how heavily to rely on the image, as opposed to the linguistic model, to emit the next caption word. The technology disclosed further adds a new auxiliary sentinel gate to an LSTM architecture and produces a sentinel LSTM (Sn-LSTM). The sentinel gate produces a visual sentinel at each timestep, which is an additional representation, derived from the LSTM'"'"'s memory, of long and short term visual and linguistic information.
-
Citations
28 Claims
-
1. A recurrent neural network system (RNN) running on numerous parallel processors, comprising:
-
a sentinel long short-term memory network (Sn-LSTM) that; comprises a memory cell, an input gate, a forget gate, an output gate, and an auxiliary sentinel gate; receives inputs at each of a plurality of timesteps, the inputs including at least; an input for a current timestep, a hidden state from a previous timestep, and an auxiliary input for the current timestep; stores in the memory cell auxiliary information accumulated over time from processing of the inputs by the input gate, the forget gate, and the output gate; updates the memory cell with gate outputs produced by the input gate, the forget gate, and the output gate; generates, using the output gate, a hidden state as a first output of the Sn-LSTM based on the input, for the current timestep, the hidden state from the previous timestep, and information in the memory cell; and generates, using the auxiliary sentinel gate, a sentinel state as a second output of the Sn-LSTM different from the first output based on the auxiliary input for the current timestep, the hidden state from the previous timestep, and the information in the memory cell; wherein the auxiliary sentinel gate modulates the stored auxiliary information from the memory cell for a next prediction, with the modulation conditioned on the input for the current timestep, the hidden state from the previous timestep, and the auxiliary input for the current timestep. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A sentinel long short-term memory network (Sn-LSTM) comprising:
-
a memory cell for storing a state of the Sn-LSTM; an output gate that outputs, as a first output of the Sn-LSTM, a current hidden state of the Sn-LSTM based on an input of the Sn-LSTM, a previous hidden state of the Sn-LSTM, and information in the memory cell; and an auxiliary sentinel gate that; modulates use of auxiliary information during a next prediction, the auxiliary information being accumulated over time in a memory cell at least from processing of an auxiliary input of the Sn-LSTM combined with the input and the previous hidden state, and outputs, as a second output of the Sn-LSTM different from the first output, a sentinel state of the Sn-LSTM with the accumulated auxiliary information useful for the next prediction. - View Dependent Claims (20, 21, 22)
-
-
23. A method comprising:
-
receiving, at a Sn-LSTM, an input for a current timestep, a hidden state from a previous timestep, and an auxiliary input for the current timestep; storing, in a memory cell of the Sn-LSTM, auxiliary information accumulated over time from processing of the inputs by an input gate, a forget gate, and an output gate of the Sn-LSTM; generating, using an output gate of the Sn-LSTM, a hidden state as a first output of the Sn-LSTM based on the input for the current timestep, the hidden state from the previous timestep, and information in the memory cell; and generating, using an auxiliary sentinel gate of the Sn-LSTM, a sentinel state as a second output of the Sn-LSTM different from the first output based on the auxiliary input for the current timestep, the hidden state from the previous timestep, and the information in the memory cell; wherein the auxiliary sentinel gate modulates use of auxiliary information during a next prediction, the auxiliary information being accumulated over time in the memory cell at least from the processing of an auxiliary input combined with a current input and a previous hidden state. - View Dependent Claims (24, 25, 26, 27)
-
-
28. A recurrent neural network system (RNN) running on numerous parallel processors for machine generation of a natural language caption for an image, comprising:
-
an input provider for providing a plurality of inputs to a sentinel long short-term memory network (Sn-LSTM) over successive timesteps, wherein the inputs include at least an input for a current timestep, a hidden state from a previous timestep, and an auxiliary input for the current timestep; the Sn-LSTM comprising; at least an input gate, a forget gate, an output gate, and an auxiliary sentinel gate; a memory cell for storing auxiliary information accumulated over time from processing of the inputs by the input gate and the forget gate; the output gate for generating, as a first output of the Sn-LSTM, a hidden state of the Sn-LSTM from processing of the input for the current timestep, the hidden state from the previous timestep, and contents of the memory cell; the auxiliary sentinel gate for modulating the stored auxiliary information from the memory cell to produce, as a second output of the Sn-LSTM, a sentinel state at each timestep, with the modulation conditioned on the input for the current timestep, the hidden state from the previous timestep, and the auxiliary input for the current timestep; and an emitter for generating the natural language caption for the image based on the sentinel states produced over successive timesteps by the auxiliary sentinel gate.
-
Specification