Commit 83763355 authored by Christian Fuß's avatar Christian Fuß
Browse files

Added test for ShowAttendTell architecture. Adjusted Readme.md

parent 128601f9
Pipeline #211619 passed with stages
in 19 minutes and 21 seconds
......@@ -403,3 +403,64 @@ All predefined methods start with a capital letter and all constructed methods h
* **size** (integer > 0, optional): The OneHot-vector's size. Can be omitted to automatically use the output size of the architecture.
* **ArgMax()**
Computes the index of the maximal value of its input vector. Useful for recurrent networks, when the output of a timestep should be used as integer input for the next timestep.
* **BeamSearch(max_length, width)**
Must be used together with a recurrent network. Uses Beamsearch as search algorithm over the timesteps of the RNN.
* **max_length** (integer > 0, required): The maximum number of timesteps to run the RNN, and thus the maximum length of the generated sequence.
* **width** (integer > 0, required): The number of candidates to consider each in timestep. Sometimes called k.
* **BroadcastAdd()**
Takes multiple tensors as input, and broadcasts them to the same shape (Copies values along one axis until it has the size of the largest axis along all inputs). Then performs elementswise addition.
* **BroadcastMultiply()**
Takes multiple tensors as input, and broadcasts them to the same shape (Copies values along one axis until it has the size of the largest axis along all inputs). Then performs elementswise multiplication.
* **Dot()**
Performs the dot product (matrix multiplication) for two input matrices.
* **ExpandDims(axis)**
Creates a new, empty axis for a given input tensor.
* **axis** (0 <= integer <= 1, required): The axis to expand.
* **GreedySearch(max_length)**
Must be used together with a recurrent network. Uses Greedysearch as search algorithm over the timesteps of the RNN, so that only the best output for each timestep is considered.
* **max_length** (integer > 0, required): The maximum number of timesteps to run the RNN, and thus the maximum length of the generated sequence.
* **ReduceSum(axis)**
Sums all values along a given axis, and reduces the dimension of the axis afterwards, making a scalar out of a one-entry vector etc.
* **axis** (0 <= integer <= 1, optional, default=-1): The axis to sum over. Uses the last axis (-1) by default.
* **Repeat(n, axis)**
Copies the entries of an axis n times in the same axis.
* **n** (integer > 0, required): How often to copy the entries of the given axis
* **axis** (-1 <= integer <= 2, optional, default=-1): The axis to use for copying. Uses the last axis (-1) by default.
* **Reshape(shape)**
Transforms the input tensor into a different shape, while keeping the number of total entries in the tensor.
* **shape** (integer tuple, required): New shape of the tensor.
\ No newline at end of file
......@@ -51,6 +51,7 @@ public class AllCoCoTest extends AbstractCoCoTest {
checkValid("architectures", "SequentialAlexnet");
checkValid("architectures", "ThreeInputCNN_M14");
checkValid("architectures", "VGG16");
checkValid("architectures", "ShowAttendTell");
checkValid("valid_tests", "ArgumentSequenceTest");
checkValid("valid_tests", "Fixed_Alexnet");
......
architecture ShowAttendTell(max_length=25, img_height=224, img_width=224, img_channels=3){
def input Z(0:255)^{img_channels, img_height, img_width} images
def output Z(0:37758)^{1} target[25]
layer LSTM(units=512) decoder;
layer FullyConnected(units = 256, flatten=false) features;
layer FullyConnected(units = 1, flatten=false) attention;
0 -> target[0];
images ->
Convolution(kernel=(7,7), channels=128, stride=(7,7), padding="valid") ->
Convolution(kernel=(4,4), channels=128, stride=(4,4), padding="valid") ->
Reshape(shape=(64, 128)) ->
features;
timed <t> GreedySearch(max_length=max_length){
(
(
(
features.output ->
FullyConnected(units=512, flatten=false)
|
decoder.state[0] ->
FullyConnected(units=512, flatten=false)
) ->
BroadcastAdd() ->
Tanh() ->
FullyConnected(units=1, flatten=false) ->
Softmax(axis=0) ->
attention
|
features.output
)->
BroadcastMultiply() ->
ReduceSum(axis=0) ->
ExpandDims(axis=0)
|
target[t-1] ->
Embedding(output_dim=256)
) ->
Concatenate(axis=1) ->
decoder ->
FullyConnected(units=37758) ->
Tanh() ->
Dropout(p=0.25) ->
Softmax() ->
ArgMax() ->
target[t]
};
}
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment