Split python wrapper test to extra test class

3 jobs for implement-reinforcement-learning in 4 minutes and 31 seconds