README.md 17.3 KB
Newer Older
Thomas Michael Timmermanns's avatar
Thomas Michael Timmermanns committed
1
[![Maintainability](https://api.codeclimate.com/v1/badges/fc45309cb83a31c9586e/maintainability)](https://codeclimate.com/github/EmbeddedMontiArc/CNNArchLang/maintainability)
Thomas Michael Timmermanns's avatar
Thomas Michael Timmermanns committed
2 3 4
[![Build Status](https://travis-ci.org/EmbeddedMontiArc/CNNArchLang.svg?branch=master)](https://travis-ci.org/EmbeddedMontiArc/CNNArchLang)
[![Build Status](https://circleci.com/gh/EmbeddedMontiArc/CNNArchLang/tree/master.svg?style=shield&circle-token=:circle-token)](https://circleci.com/gh/EmbeddedMontiArc/CNNArchLang/tree/master)
[![Coverage Status](https://coveralls.io/repos/github/EmbeddedMontiArc/CNNArchLang/badge.svg?branch=master)](https://coveralls.io/github/EmbeddedMontiArc/CNNArchLang?branch=master)
Thomas Michael Timmermanns's avatar
Thomas Michael Timmermanns committed
5 6 7 8

# CNNArch
**work in progress**
## Introduction
9 10 11
CNNArch is a declarative language to build architectures of feedforward neural networks with a special focus on convolutional neural networks. 
It is being developed for use in the MontiCar language family, along with CNNTrain which configures the training of the network and EmbeddedMontiArcDL 
which combines the languages into a EmbeddedMontiArc component.
Thomas Michael Timmermanns's avatar
Thomas Michael Timmermanns committed
12 13 14 15
The inputs and outputs of a network are strongly typed and the validity of a network is checked at compile time.
In the following, we will explain the syntax and all features of CNNArch in combination with code examples to show how these can be used.

## Basic Structure
16 17 18 19 20 21 22 23 24 25
The syntax of this language has many similarities to python in the way how variables and methods are handled. 
Variables which occur only in form of parameters are seemingly untyped. 
However, the correctness of their values is checked at compile time.
The header of the architecture declares architecture parameters which are usually used to define the Dimensions of inputs and outputs.
The top part of the architecture consists of input, output or method declarations.
The main part is the actual definition of the architecture in the form of a collection of layers which are connected through the two operators "->" and "|". 
A layer can either be a method, an input or an output. 
The following is a complete example of the original version of Alexnet by A. Krizhevsky. 
There are more compact versions of the same architecture but we will get to that later. 
All predefined methods are listed at the end of this document.
Thomas Michael Timmermanns's avatar
Thomas Michael Timmermanns committed
26
```
27
architecture Alexnet_alt(img_height=224, img_width=224, img_channels=3, classes=10){
28 29
    def input Z(0:255)^{img_channels, img_height, img_width} image
    def output Q(0:1)^{classes} predictions
Thomas Michael Timmermanns's avatar
Thomas Michael Timmermanns committed
30 31 32 33

    image ->
    Convolution(kernel=(11,11), channels=96, stride=(4,4), padding="no_loss") ->
    Lrn(nsize=5, alpha=0.0001, beta=0.75) ->
34
    Pooling(pool_type="max", kernel=(3,3), stride=(2,2), padding="no_loss") ->
Thomas Michael Timmermanns's avatar
Thomas Michael Timmermanns committed
35 36 37 38 39 40
    Relu() ->
    Split(n=2) ->
    (
        [0] ->
        Convolution(kernel=(5,5), channels=128) ->
        Lrn(nsize=5, alpha=0.0001, beta=0.75) ->
41
        Pooling(pool_type="max", kernel=(3,3), stride=(2,2), padding="no_loss") ->
Thomas Michael Timmermanns's avatar
Thomas Michael Timmermanns committed
42 43 44 45 46
        Relu()
    |
        [1] ->
        Convolution(kernel=(5,5), channels=128) ->
        Lrn(nsize=5, alpha=0.0001, beta=0.75) ->
47
        Pooling(pool_type="max", kernel=(3,3), stride=(2,2), padding="no_loss") ->
Thomas Michael Timmermanns's avatar
Thomas Michael Timmermanns committed
48 49 50 51 52 53 54 55 56 57 58
        Relu()
    ) ->
    Concatenate() ->
    Convolution(kernel=(3,3), channels=384) ->
    Relu() ->
    Split(n=2) ->
    (
        [0] ->
        Convolution(kernel=(3,3), channels=192) ->
        Relu() ->
        Convolution(kernel=(3,3), channels=128) ->
59
        Pooling(pool_type="max", kernel=(3,3), stride=(2,2), padding="no_loss") ->
Thomas Michael Timmermanns's avatar
Thomas Michael Timmermanns committed
60 61 62 63 64 65
        Relu()
    |
        [1] ->
        Convolution(kernel=(3,3), channels=192) ->
        Relu() ->
        Convolution(kernel=(3,3), channels=128) ->
66
        Pooling(pool_type="max", kernel=(3,3), stride=(2,2), padding="no_loss") ->
Thomas Michael Timmermanns's avatar
Thomas Michael Timmermanns committed
67 68 69 70 71 72 73 74 75 76 77 78 79 80
        Relu()
    ) ->
    Concatenate() ->
    FullyConnected(units=4096) ->
    Relu() ->
    Dropout() ->
    FullyConnected(units=4096) ->
    Relu() ->
    Dropout() ->
    FullyConnected(units=classes) ->
    Softmax() ->
    predictions
}
```
81 82
*Note: The third convolutional and the first two fully connected layers are not divided into two streams like they are in the original Alexnet. 
This is done for the sake of simplicity. However, this change should not affect the actual computation.*
Thomas Michael Timmermanns's avatar
Thomas Michael Timmermanns committed
83 84

## Layer Operators
85 86 87 88 89 90 91 92 93 94 95
This language does not use symbols to denote a connections between layers like most deep learning frameworks but instead uses a approach which describes the data flow through the network. 
The first operator is the serial connection "->". The operator simply connects the output of the first layer to the input of the second layer. 
Despite being sequential in nature, CNNArch is still able to describe complex networks like ResNeXt through the use of the parallelization operator "|". 
This operator splits the network into parallel data streams. 
The serial connection operator has a higher precedence than the parallel connection operator. 
Therefore it is necessary to use brackets around each parallel group of layers.
Each element in a parallel group has the same input stream as the whole group. 
The output of a parallel group is a list of streams which can be merged into a single stream through use of the following methods: 
`Convolution()`, `Add()` or `Get(index)`. 
Note: `Get(index=i)` can be abbreviated by `[i]`. 
The method `Split(n)` in the example above creates multiple output streams from a single input stream by splitting the data itself into *n* streams which can then handled separately.
Thomas Michael Timmermanns's avatar
Thomas Michael Timmermanns committed
96 97 98


## Inputs and Outputs
99
An architecture in CNNArch can have multiple inputs and outputs. 
Thomas Michael Timmermanns's avatar
Thomas Michael Timmermanns committed
100
Multiple inputs (or outputs) of the same form can be combined to an array. 
101
Assuming `h` and `w` are architecture parameter, the following is a valid example:
Thomas Michael Timmermanns's avatar
Thomas Michael Timmermanns committed
102
```
103 104 105
def input Z(0:255)^{3, h, w} image[2]
def input Q(-oo:+oo)^{10} additionalData
def output Q(0:1)^{3} predictions
Thomas Michael Timmermanns's avatar
Thomas Michael Timmermanns committed
106
```
107 108
The first line defines the input *image* as an array of two rgb (or bgr) images with a resolution of `h` x `w`. 
The part `Z(0:255)`, which corresponds to the type definition in EmbeddedMontiArc, restricts the values to integers between 0 and 255. 
109 110
The following line `{3, h, w}` declares the shape of the input. 
The shape denotes the dimensionality in form  of depth (number of channels), height and width. 
111
Here, the height is initialized as `h`, the width as `w` and the number of channels is 3.  
112 113
The second line defines another input with one dimension of size 10 and arbitrary rational values. 
The last line defines an one-dimensional output of size 3 with rational values between 0 and 1 (probabilities of 3 classes).
114 115 116 117 118

If an input or output is an array, it can be used in the architecture in two different ways. 
Either a single element is accessed or the array is used as a whole. 
The line `image[0] ->` would access the first image of the array and `image ->` would directly result in 2 output streams. 
In fact, `image ->` is identical to `(image[0] | image[1]) ->`. 
Thomas Michael Timmermanns's avatar
Thomas Michael Timmermanns committed
119
Furthermore, assuming *out* is a output array of size 2, the line `-> out` would be identical to `-> ([0]->out[0] | [1]->out[1])`. 
120 121
Inputs and outputs can also be used in the middle of an architecture. 
In general, inputs create new streams and outputs consume existing streams.
Thomas Michael Timmermanns's avatar
Thomas Michael Timmermanns committed
122 123

## Methods
124
It is possible to declare and construct new methods. The method declaration is similar to python. 
Thomas Michael Timmermanns's avatar
Thomas Michael Timmermanns committed
125 126 127
Each parameter can have a default value that makes it an optional argument. 
The method call is also similar to python but, in contrast to python, it is necessary to specify the name of each argument. 
The body of a new method is constructed from other layers including other user-defined methods. However, recursion is not allowed. 
128 129
The compiler will throw an error if recursion occurs. 
The following is a example of multiple method declarations.
Thomas Michael Timmermanns's avatar
Thomas Michael Timmermanns committed
130 131 132 133
```
    def conv(filter, channels, stride=1, act=true){
        Convolution(kernel=(filter,filter), channels=channels, stride=(stride,stride)) ->
        BatchNorm() ->
134
        Relu(?=act)
Thomas Michael Timmermanns's avatar
Thomas Michael Timmermanns committed
135 136 137 138 139 140 141 142 143 144
    }
    def skip(channels, stride){
        Convolution(kernel=(1,1), channels=channels, stride=(stride,stride)) ->
        BatchNorm()
    }
    def resLayer(channels, stride=1){
        (
            conv(filter=3, channels=channels, stride=stride) ->
            conv(filter=3, channels=channels, act=false)
        |
145
            skip(channels=channels, stride=stride, ?=(stride!=1))
Thomas Michael Timmermanns's avatar
Thomas Michael Timmermanns committed
146 147 148 149 150
        ) ->
        Add() ->
        Relu()
    }
```
151
The method `resLayer` in this example corresponds to a building block of a Residual Network. 
152
The `?` argument is a special argument which is explained in the next section.
Thomas Michael Timmermanns's avatar
Thomas Michael Timmermanns committed
153 154

## Special Arguments
155
There exists special structural arguments which can be used in each method. 
156 157
These are `->`, `|` and `?`. `->` and `|` can only be positive integers and `?` can only be a boolean. 
The argument `?` does not nothing if it is true and removes the layer completely if it is false. 
158 159 160 161 162
The other two arguments create a repetition of the method. 
We will show their effect with examples. 
Assuming `a` is a method without required arguments, 
then `a(-> = 3)->` is equal to `a()->a()->a()->`, 
`a(| = 3)->` is equal to `(a() | a() | a())->` and 
163
`a(-> = 3, | = 2)->` is equal to `(a()->a()->a() | a()->a()->a())->`. 
Thomas Michael Timmermanns's avatar
Thomas Michael Timmermanns committed
164 165

## Argument Sequences
166 167 168 169 170 171 172 173 174 175 176 177
It is also possible to create a repetition of a method in another way through the use of argument sequences. 
The following are valid sequences: `[2->5->3]`, `[true|false|false]`, `[2->1|4->4->6]`, `[ |2->3]`, `1->..->5` and `3|..|-2`. 
All values in these examples could also be replaced by variable names or expressions. 
The first three are standard sequences and the last two are intervals. 
An interval can be translated to a standard sequence. 
The interval `3|..|-2` is equal to `[3|2|1|0|-1|-2]` and `1->..->5` is equal to `[1->2->3->4->5]`. 

If a argument is set to a sequence, the method will be repeated for each value in the sequence and the connection between the layers will be the same as it is between the values of the sequence. 
An argument which has a single value is neutral to the repetition which means that it will be repeated an arbitrary number of times without interfering with the repetition. 
If a method contains multiple argument sequences, CNNArch will try to combine the sequences. 
The language will throw an error at compile time if this fails. 
Assuming the method `m(a, b, c)` exists, the line `m(a=[5->3], b=[3|4|2], c=2)->` is equal to:
Thomas Michael Timmermanns's avatar
Thomas Michael Timmermanns committed
178 179
```
(
Thomas Michael Timmermanns's avatar
Thomas Michael Timmermanns committed
180
    m(a=5, b=3, c=2) ->
Thomas Michael Timmermanns's avatar
Thomas Michael Timmermanns committed
181 182
    m(a=3, b=3, c=2)
|
Thomas Michael Timmermanns's avatar
Thomas Michael Timmermanns committed
183
    m(a=5, b=4, c=2) ->
Thomas Michael Timmermanns's avatar
Thomas Michael Timmermanns committed
184 185
    m(a=3, b=4, c=2)
|
Thomas Michael Timmermanns's avatar
Thomas Michael Timmermanns committed
186
    m(a=5, b=2, c=2) ->
Thomas Michael Timmermanns's avatar
Thomas Michael Timmermanns committed
187 188 189 190 191 192
    m(a=3, b=2, c=2)
) ->
```
However, the line `m(a=[5->3], b=[2|4->6], c=2)->` would throw an error because it is not possible to expand *a* such that it is the same size as *b*.

## Expressions
193 194
Currently, the working expression operators are the basic arithmetic operators "+", "-", "\*", "/", the logical operators "&&", "||" and for most cases the comparison operators "==", "!=", "<", ">", "<=", ">=". 
The comparison operators do not work reliably for the comparison of tuple (they only compare the last element in the tuples). 
Thomas Michael Timmermanns's avatar
Thomas Michael Timmermanns committed
195

Thomas Michael Timmermanns's avatar
Thomas Michael Timmermanns committed
196
## Advanced Example
Thomas Michael Timmermanns's avatar
Thomas Michael Timmermanns committed
197 198
This version of Alexnet, which uses method construction, argument sequences and special arguments, is identical to the one in the section Basic Structure.
```
199
architecture Alexnet_alt2(img_height=224, img_width=224, img_channels=3, classes=10){
200 201
    def input Z(0:255)^{img_channels, img_height, img_width} image
    def output Q(0:1)^{classes} predictions
Thomas Michael Timmermanns's avatar
Thomas Michael Timmermanns committed
202 203 204
    
    def conv(filter, channels, convStride=1, poolStride=1, hasLrn=false, convPadding="same"){
    	Convolution(kernel=(filter,filter), channels=channels, stride=(convStride,convStride), padding=convPadding) ->
205
        Lrn(nsize=5, alpha=0.0001, beta=0.75, ?=hasLrn) ->
206
        Pooling(pool_type="max", kernel=(3,3), stride=(poolStride,poolStride), padding="no_loss", ?=(poolStride != 1)) ->
Thomas Michael Timmermanns's avatar
Thomas Michael Timmermanns committed
207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232
        Relu()
    }
    def split1(i){
        [i] ->
        conv(filter=5, channels=128, poolStride=2, hasLrn=true)
    }
    def split2(i){
        [i] ->
        conv(filter=3, channels=192) ->
        conv(filter=3, channels=128, poolStride=2)
    }
    def fc(){
        FullyConnected(units=4096) ->
        Relu() ->
        Dropout()
    }

    image ->
    conv(filter=11, channels=96, convStride=4, poolStride=2, hasLrn=true, convPadding="no_loss") ->
    Split(n=2) ->
    split1(i=[0|1]) ->
    Concatenate() ->
    conv(filter=3, channels=384) ->
    Split(n=2) ->
    split2(i=[0|1]) ->
    Concatenate() ->
233
    fc(-> = 2) ->
Thomas Michael Timmermanns's avatar
Thomas Michael Timmermanns committed
234 235 236 237 238 239 240 241
    FullyConnected(units=classes) ->
    Softmax() ->
    predictions
}
```


## Predefined Layers
242 243
All methods with the exception of *Concatenate*, *Add*, *Get* and *Split* can only handle 1 input stream and have 1 output stream. 
All predefined methods start with a capital letter and all constructed methods have to start with a lowercase letter.
Thomas Michael Timmermanns's avatar
Thomas Michael Timmermanns committed
244 245 246 247 248 249 250 251 252 253 254 255 256 257 258

* **FullyConnected(units, no_bias=false)**

  Creates a fully connected layer and applies flatten to the input if necessary.
    
  * **units** (integer > 0, required): number of neural units in the output.
  * **no_bias** (boolean, optional, default=false): Whether to disable the bias parameter.
  
* **Convolution(kernel, channels, stride=(1,1), padding="same", no_bias=false)**

  Creates a convolutional layer. Currently, only 2D convolutions are allowed
    
  * **kernel** (integer tuple > 0, required): convolution kernel size: (height, width).
  * **channels** (integer > 0, required): number of convolution filters and number of output channels.
  * **stride** (integer tuple > 0, optional, default=(1,1)): convolution stride: (height, width).
259
  * **padding** ({"valid", "same", "no_loss"}, optional, default="same"): One of "valid", "same" or "no_loss". "valid" means no padding. "same"   results in padding the input such that the output has the same length as the original input divided by the stride (rounded up). "no_loss" results in minimal padding such that each input is used by at least one filter (identical to "valid" if *stride* equals 1).
Thomas Michael Timmermanns's avatar
Thomas Michael Timmermanns committed
260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279
  * **no_bias** (boolean, optional, default=false): Whether to disable the bias parameter.

* **Softmax()**

  Applies softmax activation function to the input.
    
* **Tanh()**

  Applies tanh activation function to the input.
    
* **Sigmoid()**

  Applies sigmoid activation function to the input.
    
* **Relu()**

  Applies relu activation function to the input.
    
* **Flatten()**

280 281
  Reshapes the input such that height and width are 1. 
  Usually not necessary because the FullyConnected layer applies *Flatten* automatically.
Thomas Michael Timmermanns's avatar
Thomas Michael Timmermanns committed
282 283 284 285 286 287
    
* **Dropout()**

  Applies dropout operation to input array during training.
    
  * **p** (1 >= float >= 0, optional, default=0.5): Fraction of the input that gets dropped out during training time.
288
  
289
* **Pooling(pool_type, kernel, stride=(1,1), padding="same")**
Thomas Michael Timmermanns's avatar
Thomas Michael Timmermanns committed
290

291 292
  Performs pooling on the input.
  
293
  * **pool_type** ({"avg", "max"}, required): Pooling type to be applied.
294
  * **kernel** (integer tuple > 0, required): convolution kernel size: (height, width).
Thomas Michael Timmermanns's avatar
Thomas Michael Timmermanns committed
295
  * **stride** (integer tuple > 0, optional, default=(1,1)): convolution stride: (height, width).
296
  * **padding** ({"valid", "same", "no_loss"}, optional, default="same"): One of "valid", "same" or "no_loss". "valid" means no padding. "same"   results in padding the input such that the output has the same length as the original input divided by the stride (rounded up). "no_loss" results in minimal padding such that each input is used by at least one filter (identical to "valid" if *stride* equals 1).
Thomas Michael Timmermanns's avatar
Thomas Michael Timmermanns committed
297

298
* **GlobalPooling(pool_type)**
Thomas Michael Timmermanns's avatar
Thomas Michael Timmermanns committed
299

300 301
  Performs global pooling on the input.
  
302
  * **pool_type** ({"avg", "max"}, required): Pooling type to be applied.
Thomas Michael Timmermanns's avatar
Thomas Michael Timmermanns committed
303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321

* **Lrn(nsize, knorm=2, alpha=0.0001, beta=0.75)**

  Applies local response normalization to the input.
  See: [mxnet](https://mxnet.incubator.apache.org/api/python/symbol.html#mxnet.symbol.LRN)
    
  * **nsize** (integer > 0, required): normalization window width in elements.
  * **knorm** (float, optional, default=2): The parameter k in the LRN expression.
  * **alpha** (float, optional, default=0.0001): The variance scaling parameter *alpha* in the LRN expression.
  * **beta** (float, optional, default=0.75): The power parameter *beta* in the LRN expression.

* **BatchNorm(fix_gamma=true)**
    
  Batch normalization.
    
  * **fix_gamma** (boolean, optional, default=true): Fix gamma while training.

* **Concatenate()**
    
322 323 324
  Merges multiple input streams into one output stream by concatenation of channels. 
  The height and width of all inputs must be identical. 
  The number of channels in the output shape is the sum of the number of channels in the shape of the input streams.
Thomas Michael Timmermanns's avatar
Thomas Michael Timmermanns committed
325 326 327
    
* **Add()**
    
328 329 330
  Merges multiple input streams into one output stream by adding them element-wise together. 
  The height, width and the number of channels of all inputs must be identical. 
  The output shape is identical to each input shape.
Thomas Michael Timmermanns's avatar
Thomas Michael Timmermanns committed
331 332 333
    
* **Get(index)**

334 335
  `Get(index=i)` can be abbreviated with `[i]`. Selects one out of multiple input streams. 
  The single output stream is identical to the selected input. 
Thomas Michael Timmermanns's avatar
Thomas Michael Timmermanns committed
336 337 338 339 340
  
  * **index** (integer >= 0, required): The zero-based index of the selected input.

* **Split(n)**

341 342 343
  Opposite of *Concatenate*. Handles a single input stream and splits it into *n* output streams. 
  The output streams have the same height and width as the input stream and a number channels which is in general `input_channels / n`. 
  The last output stream will have a higher number of channels than the other if `input_channels` is not divisible by `n`.
Thomas Michael Timmermanns's avatar
Thomas Michael Timmermanns committed
344 345
  
  * **n** (integer > 0, required): The number of output streams. Cannot be higher than the number of input channels.
Thomas Michael Timmermanns's avatar
Thomas Michael Timmermanns committed
346 347