FabianSchmeil · bea2cc85
--- a/Changelog since assignment 5.md

+ 13

− 29
+++ b/Changelog since assignment 5.md

+ 13

− 29

-## Biggest Changes: 
+## Changes: 

 - Changed beam search to actually work for rnn model 
 	We now extract finished beams
 - Conducted a thorough hyperparameter search to fine tune our model (see xlsx)
- Created an Ensemble with weighted models based on BLEU performance
-
-Other changes include: 
-
- Changed decoder to be unidirectional and not bidirectional 
- Discarded sentence flipping before giving it into our model 
- fixed toking padding
- added dot product attention (from pytorch) + masked attention
- removed window size param for ff bleu
- removed softmax application in eval while training (was wrong)
- get_model_bleu (function) fixed
- (changed the structure of the config file)
- implemented teacher forcing ratio
- implemented teacher forcing decay (towards the end of the training)
- added weight decay to stop overfitting towards the end of the training
+- Other changes include: 
+  - Changed decoder to be unidirectional and not bidirectional 
+  - Discarded sentence flipping before giving it into our model 
+  - fixed padding token in Dataset bug
+  - added dot product attention (from pytorch) + masked attention
+  - removed window size param for ff bleu
+  - removed softmax application in eval while training (was wrong)
+  - get_model_bleu (function) fixed
+  - (changed the structure of the config file)
+  - implemented teacher forcing ratio
+  - implemented teacher forcing decay (towards the end of the training)
+  - added weight decay to stop overfitting towards the end of the training


 #### tried but did not result in significant improvements over status quo:
 @@ -34,14 +31,6 @@ Best hyperparameters for our model are:
 ...... see xlsx file, where marked


-## Ensemble 
-We created an ensemble model that combines the predictions of given models from a directory. 
-The predictions of the models are weighted by their respective BLEU scores. 
-However, the ensemble model did not outperform the best model.
-This might be due to the fact that, as of right now, we can just add a feedforward model that is by far not as good as the RNN. 
-
-
-
 ### Configurable Layers // Deeper Decoder

 New Config Parameters: 
 @@ -57,11 +46,6 @@ batch_norm_ll is set to False.
 These linear layers are applied in each step after the rnn layer (lookup forward_step).


-## new function : reshape_state
-
-
-
-
 ## Changed decoder to be unidirectional and not bidirectional 

 Initially we used a bidirectional decoder approach, which wasn't scientifically backed,