## This small guide explains how to install and correctly run the parallel version of FlameMaster
## in the RWTH Cluster, especially for the backend. 
## Access to both FlameMaster and eglib (https://git.rwth-aachen.de/ITV/eglib) repositories is required.
## Ask me if you are interested in getting access. 
## All members of the ITV-RWTH-GitLab group have access, and no further action is required for them.

## To checkout and compile the parallel version of FlameMaster in the
## RWTH Cluster, the following steps are required:

cd your/parallel/FlameMaster
git clone git@git.rwth-aachen.de:ITV/FlameMaster.git Repository --branch jl_dco_activated
cd Repository/src/libraries/
git clone git@git.rwth-aachen.de:ITV/eglib.git
cd ../../../
mkdir -p Build && cd Build

## The following installation uses Intel compilers and Intel MKL.
## For the RWTH Cluster, this is the preferred configuration (other compilers can be used
## but the performance is lower)

## Unload modules for safety
module unload gcc
module unload intel
module unload clang

## Load intel and gcc/8
module load intel Eigen/3.3.7 GCC/8.3.0

## Set up intel variables
source /opt/intel/oneAPI/2023.0/setvars.sh


## CMake command with the correct instructions for the broadwell nodes (ex "citv 5-7")
## Setup has been tested on the broadwell nodes, as well as the frontend
## The compiled executables do not work on ivybridge machines.
CXX=icpc CC=icc FC=ifort cmake ../Repository -DCMAKE_BUILD_TYPE=Release -DEIGEN_INTEGRATION=ON -DCOMBUSTION_LIBS=ON -DCMAKE_CXX_FLAGS_RELEASE="-Ofast -ffast-math -DNDEBUG -march=broadwell -mtune=broadwell -funroll-all-loops -qopt-multi-version-aggressive -ipo -parallel" -DCMAKE_C_FLAGS_RELEASE="-Ofast -ffast-math -DNDEBUG -march=broadwell -mtune=broadwell -funroll-all-loops -qopt-multi-version-aggressive -ipo -parallel" -DCMAKE_Fortran_FLAGS_RELEASE="-Ofast -DNDEBUG -march=broadwell -mtune=broadwell -funroll-all-loops -qopt-multi-version-aggressive -ipo -parallel" -DFAST_COLLISION_INTEGRAL=ON -DINSTALL_SUNDIALS=ON -DSUNDIALS_LAPACK=ON

## Compile and install
make -j12 install

###################################################################
## BENCHMARK RUN TO CHECK IF THE CODE HAS BEEN PROPERLY COMPILED ##
###################################################################

# It is STRONGLY advised NOT to run on the frontend. 
# Always use the backend nodes (in this case the "broadwell" nodes) through SLURM 

## Backend test ##

# Setup for the test run 
# Follow these steps:

# Generate the required mechanisms
cd /home/YOUR_TIMID/your/parallel/FlameMaster/Run/ScanMan/
bash CreateAllMechanisms.bash
 
# Test run directory
cd ../../Run/FlameMan/Diff/SteadyPlugFlow/Wullenkord

## To run the test in the backend, copy the template submission script in the same directory
## with the following command. In order to have the correct access to the ITV nodes, use this 
## Script which has the latest options to run on our computational nodes
cp /home/itv/SLURM_submission_scripts/SlurmScript_FM .

## Modify line 76 ('exe') and 79 ('arg') with the correct path of your FlameMaster executable and Data directory
## i.e. " exe ='/home/YOUR_TIMID/your/parallel/FlameMaster/Bin/bin/FlameMan' "
## run the script:

sbatch SlurmScript_FM

## Check the job output (job.%JOBID.out) to
## see if the execution time matches the expected time for the parallel run (as follows)

# Expected runtime for 12 threads (parallel)
# ~ 16.5 sec +- 0.5 sec (RWTH cluster frontend, Broadwell E5-2650 v4 @ 2.20GHz, with 12 threads, 
# performance tuning node)
## Expected runtime for 1 thread (serial)
# ~ 73.8 sec +- 1 sec (RWTH cluster frontend, Broadwell E5-2650 v4 @ 2.20GHz, performance tuning node)

## If your runtime is significantly far (more than a couple of seconds) from the expected parallel runtime, something
## might be wrong with your setup.

################# RUN OTHER CASES ######################

## adapt the slurm script mentioned above (/home/itv/SLURM_submission_scripts/SlurmScript_FM)
## to your needs (executable path, total time, mail notification, input name)

## The correct set of module for the parallel FM to run are already specified
## Differently from the front end, OMP_NUM_THREADS=XX is not required,
## as it is handled by SLURM via the option
## #SBATCH --cpus-per-task=XX
## XX=12 is the preferred number of threads for the broadwell nodes (ex "citv 5-7") used at the ITV 
## (adding all 24 cores doesn't have any benefit).
## In SLURM, "CPUs" are the physical cores. If tasks-per-cpu=1 (No multithreading) physical cores = omp_threads 


## The value for memory requirements (mem-per-cpu) has been set to 3GB per thread. 
## Change it accordingly to your needs (if not specified, 1GB per thread will be assigned).
## The maximum value is Total memory node (~128 GB for broadwell) divided the number of used threads. 
## SLURM will throw an error if the requested memory exceeds the available one.

## Please remember that this configuration has been tested only for intel compilers and broadwell nodes
## on the RWTH Cluster.

## Other machines, operating systems, compilers, and LAPACK  implementations have been
## tested and can be used but can require a different CMake configuration command.
## Switching compilers, replacing the intel MKL with a different LAPACK implementation,
## or changing the architecture-specific optimization flags with an incorrect 
## CMake configuration can lead to incorrect simulation results.
## Further details are discussed here: https://git.rwth-aachen.de/-/snippets/766 .