Use mean when calculating actor loss

3 jobs for adjustments-rl-agent in 2 minutes and 1 second (queued for 3 seconds)