## This small guide explains how to install and correctly run the parallel version of FlameMaster ## in the RWTH Cluster, especially for the backend. ## Access to both FlameMaster and eglib (https://git.rwth-aachen.de/ITV/eglib) repositories is required. ## Ask me if you are interested in getting access. ## All members of the ITV-RWTH-GitLab group have access, and no further action is required for them. ## To checkout and compile the parallel version of FlameMaster in the ## RWTH Cluster, the following steps are required: cd your/parallel/FlameMaster git clone git@git.rwth-aachen.de:ITV/FlameMaster.git Repository --branch jl_dco_activated cd Repository/src/libraries/ git clone git@git.rwth-aachen.de:ITV/eglib.git cd ../../../ mkdir -p Build && cd Build ## The following installation uses Intel compilers and Intel MKL. ## For the RWTH Cluster, this is the preferred configuration (other compilers can be used ## but the performance is lower) ## Unload modules for safety module unload gcc module unload intel module unload clang ## Load intel and gcc/8 module load intel Eigen/3.3.7 GCC/8.3.0 ## Set up intel variables source /opt/intel/oneAPI/2023.0/setvars.sh ## CMake command with the correct instructions for the broadwell nodes (ex "citv 5-7") ## Setup has been tested on the broadwell nodes, as well as the frontend ## The compiled executables do not work on ivybridge machines. CXX=icpc CC=icc FC=ifort cmake ../Repository -DCMAKE_BUILD_TYPE=Release -DEIGEN_INTEGRATION=ON -DCOMBUSTION_LIBS=ON -DCMAKE_CXX_FLAGS_RELEASE="-Ofast -ffast-math -DNDEBUG -march=broadwell -mtune=broadwell -funroll-all-loops -qopt-multi-version-aggressive -ipo -parallel" -DCMAKE_C_FLAGS_RELEASE="-Ofast -ffast-math -DNDEBUG -march=broadwell -mtune=broadwell -funroll-all-loops -qopt-multi-version-aggressive -ipo -parallel" -DCMAKE_Fortran_FLAGS_RELEASE="-Ofast -DNDEBUG -march=broadwell -mtune=broadwell -funroll-all-loops -qopt-multi-version-aggressive -ipo -parallel" -DFAST_COLLISION_INTEGRAL=ON -DINSTALL_SUNDIALS=ON -DSUNDIALS_LAPACK=ON ## Compile and install make -j12 install ################################################################### ## BENCHMARK RUN TO CHECK IF THE CODE HAS BEEN PROPERLY COMPILED ## ################################################################### # It is STRONGLY advised NOT to run on the frontend. # Always use the backend nodes (in this case the "broadwell" nodes) through SLURM ## Backend test ## # Setup for the test run # Follow these steps: # Generate the required mechanisms cd /home/YOUR_TIMID/your/parallel/FlameMaster/Run/ScanMan/ bash CreateAllMechanisms.bash # Test run directory cd ../../Run/FlameMan/Diff/SteadyPlugFlow/Wullenkord ## To run the test in the backend, copy the template submission script in the same directory ## with the following command. In order to have the correct access to the ITV nodes, use this ## Script which has the latest options to run on our computational nodes cp /home/itv/SLURM_submission_scripts/SlurmScript_FM . ## Modify line 76 ('exe') and 79 ('arg') with the correct path of your FlameMaster executable and Data directory ## i.e. " exe ='/home/YOUR_TIMID/your/parallel/FlameMaster/Bin/bin/FlameMan' " ## run the script: sbatch SlurmScript_FM ## Check the job output (job.%JOBID.out) to ## see if the execution time matches the expected time for the parallel run (as follows) # Expected runtime for 12 threads (parallel) # ~ 16.5 sec +- 0.5 sec (RWTH cluster frontend, Broadwell E5-2650 v4 @ 2.20GHz, with 12 threads, # performance tuning node) ## Expected runtime for 1 thread (serial) # ~ 73.8 sec +- 1 sec (RWTH cluster frontend, Broadwell E5-2650 v4 @ 2.20GHz, performance tuning node) ## If your runtime is significantly far (more than a couple of seconds) from the expected parallel runtime, something ## might be wrong with your setup. ################# RUN OTHER CASES ###################### ## adapt the slurm script mentioned above (/home/itv/SLURM_submission_scripts/SlurmScript_FM) ## to your needs (executable path, total time, mail notification, input name) ## The correct set of module for the parallel FM to run are already specified ## Differently from the front end, OMP_NUM_THREADS=XX is not required, ## as it is handled by SLURM via the option ## #SBATCH --cpus-per-task=XX ## XX=12 is the preferred number of threads for the broadwell nodes (ex "citv 5-7") used at the ITV ## (adding all 24 cores doesn't have any benefit). ## In SLURM, "CPUs" are the physical cores. If tasks-per-cpu=1 (No multithreading) physical cores = omp_threads ## The value for memory requirements (mem-per-cpu) has been set to 3GB per thread. ## Change it accordingly to your needs (if not specified, 1GB per thread will be assigned). ## The maximum value is Total memory node (~128 GB for broadwell) divided the number of used threads. ## SLURM will throw an error if the requested memory exceeds the available one. ## Please remember that this configuration has been tested only for intel compilers and broadwell nodes ## on the RWTH Cluster. ## Other machines, operating systems, compilers, and LAPACK implementations have been ## tested and can be used but can require a different CMake configuration command. ## Switching compilers, replacing the intel MKL with a different LAPACK implementation, ## or changing the architecture-specific optimization flags with an incorrect ## CMake configuration can lead to incorrect simulation results. ## Further details are discussed here: https://git.rwth-aachen.de/-/snippets/766 .