Skip to content
Snippets Groups Projects

updated changelog

Merged FabianSchmeil requested to merge fs/dev into main
1 file
+ 13
29
Compare changes
  • Side-by-side
  • Inline
## Biggest Changes:
## Changes:
- Changed beam search to actually work for rnn model
We now extract finished beams
- Conducted a thorough hyperparameter search to fine tune our model (see xlsx)
- Created an Ensemble with weighted models based on BLEU performance
Other changes include:
- Changed decoder to be unidirectional and not bidirectional
- Discarded sentence flipping before giving it into our model
- fixed toking padding
- added dot product attention (from pytorch) + masked attention
- removed window size param for ff bleu
- removed softmax application in eval while training (was wrong)
- get_model_bleu (function) fixed
- (changed the structure of the config file)
- implemented teacher forcing ratio
- implemented teacher forcing decay (towards the end of the training)
- added weight decay to stop overfitting towards the end of the training
- Other changes include:
- Changed decoder to be unidirectional and not bidirectional
- Discarded sentence flipping before giving it into our model
- fixed padding token in Dataset bug
- added dot product attention (from pytorch) + masked attention
- removed window size param for ff bleu
- removed softmax application in eval while training (was wrong)
- get_model_bleu (function) fixed
- (changed the structure of the config file)
- implemented teacher forcing ratio
- implemented teacher forcing decay (towards the end of the training)
- added weight decay to stop overfitting towards the end of the training
#### tried but did not result in significant improvements over status quo:
@@ -34,14 +31,6 @@ Best hyperparameters for our model are:
...... see xlsx file, where marked
## Ensemble
We created an ensemble model that combines the predictions of given models from a directory.
The predictions of the models are weighted by their respective BLEU scores.
However, the ensemble model did not outperform the best model.
This might be due to the fact that, as of right now, we can just add a feedforward model that is by far not as good as the RNN.
### Configurable Layers // Deeper Decoder
New Config Parameters:
@@ -57,11 +46,6 @@ batch_norm_ll is set to False.
These linear layers are applied in each step after the rnn layer (lookup forward_step).
## new function : reshape_state
## Changed decoder to be unidirectional and not bidirectional
Initially we used a bidirectional decoder approach, which wasn't scientifically backed,
Loading