Training loss is NaN sometimes (PyTorch Backend)

Problem

When running certain experiments, the training loss calculation in src/main/resources/experiments/steps/MySupervisedTrainer.py (line 80) sometimes returns 'NaN'. This results in debug messages like: Epoch:1 Train Loss:nan Train Accuracy:10.03%

Steps to reproduce

Note: Getting far enough to trigger this problem requires implementing the workaround presented in issue #123, given that it is not yet solved.

This issue is not deterministic. However, when executing the EMADL2CPP generator as follows, there is a high chance of encountering the problem.

Main class: de.monticore.lang.monticar.emadl.generator.MontiAnnaCli
Program arguments: -m src/main/resources/calculator_experiment/emadl -r calculator.Connector -o target -b PYTORCH

Out of the six runs generated by this execution, typically 2-4 runs exhibit this behavior. I've never encountered this issue with other experiments like the 'adanet_experiment' or 'squaredigit_experiment', even though all mentioned experiments use the same loss metric (cross entropy).

Edited Jan 30, 2024 by Jonas Djurevci