Commit 7c7f7226 authored by Evgeny Kusmenko's avatar Evgeny Kusmenko

Merge branch 'sascha' into 'master'

Sascha

See merge request !4
parents d72e0601 dc29170a
Pipeline #409352 passed with stages
in 14 minutes and 19 seconds
target
__pycache__/
*.py[cod]
bin/dummysim/results/*
TestBuild:
image: maven:3-jdk-8
script: "mvn clean install -s settings.xml"
\ No newline at end of file
stages:
- docker
- build
buildDockerMXNetMaven:
stage: docker
tags:
- shell
script:
- cd docker
- docker login registry.git.rwth-aachen.de -u someUserName -p yEep-tAt19HbsmNQJMNo
- docker build -t registry.git.rwth-aachen.de/monticore/embeddedmontiarc/applications/reinforcement_learning/forestrl/mxnetmvn:v0.0.3 .
- docker push registry.git.rwth-aachen.de/monticore/embeddedmontiarc/applications/reinforcement_learning/forestrl/mxnetmvn:v0.0.3
only:
changes:
- docker/*
TrainAgent:
image: registry.git.rwth-aachen.de/monticore/embeddedmontiarc/applications/reinforcement_learning/cartpole/emadl-rl-mxnet:latest
stage: build
script:
- echo "Installing agent.."
- chmod +x install.sh
- ./install.sh
- echo "Training agent.."
- chmod +x train_agent.sh
- ./train_agent.sh
EvaluateAgent:
image: registry.git.rwth-aachen.de/monticore/embeddedmontiarc/applications/reinforcement_learning/cartpole/emadl-rl-mxnet:latest
stage: build
script:
- echo "Installing agent.."
- chmod +x install.sh
- ./install.sh
- echo "Evaluating agent.."
- chmod +x evaluate_agent.sh
- ./evaluate_agent.sh
RandomAgent:
image: registry.git.rwth-aachen.de/monticore/embeddedmontiarc/applications/reinforcement_learning/cartpole/emadl-rl-mxnet:latest
stage: build
script:
- echo "Installing agent.."
- chmod +x install.sh
- ./install.sh
- echo "Evaluating random agent.."
- ./evaluate_agent.sh -b random
RuleBasedAgent:
image: registry.git.rwth-aachen.de/monticore/embeddedmontiarc/applications/reinforcement_learning/cartpole/emadl-rl-mxnet:latest
stage: build
script:
- echo "Installing agent.."
- chmod +x install.sh
- ./install.sh
- echo "Evaluating rulebased agent.."
- ./evaluate_agent.sh -b rulebased
# forestrl
# ForestRL
A reinforcement learning agent for automated decision making in a forestry simulator. There are two versions of the agent, which are called singlestep and multistep. The singlestep agent completes deliveries in four individual steps, while the multistep version does all four steps at once. The multistep agent is based on the singlestep agent and reuses most of its components. The multistep directory therefore only contains the files that were changed. The singlestep version is the original version of the agent, that was described in the thesis "Autonomous Decision Making in Forestry 4.0 using Deep Reinforcement Learning". The multistep version was developed to deal with the problem of unused network outputs, that was described in the thesis. It is recommended to use the multistep version, as it improves the training results, while also being faster than the singlestep version. All scripts use the multistep agent by default.
## Prerequisites
In order to run this application you need the following tools:
### General
Generation, training, and execution were tested on Ubuntu 16.04 LTS. The generation, training, and execution of the model requires the following tools and packages:
- Java 8, Build Tools (make, cmake, gcc), Git, Python 2.7, pip, numpy, SWIG, maven:
```bash
sudo apt install openjdk-8-jre gcc make cmake git python2.7 python-dev python-numpy swig libboost-all-dev curl maven
```
- Python pip:
```bash
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
python get-pip.py --user
```
- Python packages for numpy, h5py, pyprind, matplotlib:
```bash
pip install --user h5py numpy pyprind matplotlib
```
- MXNet C++ Language Bindings (Follow official [installation guide](https://mxnet.incubator.apache.org/versions/master/install/ubuntu_setup.html))
- MXNet for Python (Follow official [installation guide](https://mxnet.incubator.apache.org/versions/master/install/index.html?platform=Linux&language=Python&processor=CPU))
- Armadillo >= 9.400.3 (Follow official [installation guide](http://arma.sourceforge.net/download.html))
- ROS Kinetic (Follow official [installation guide](http://wiki.ros.org/kinetic/Installation/Ubuntu))
## Installing the agent
To install the agent run
```bash
./install.sh
```
## Training the agent
To train the agent run
```bash
./train_agent.sh
```
The trained model can be found in `target/agent/forestrl_singlestep_agent_master/src/forestrl_singlestep_agent_master/emadlcpp/model`. In the `ForestAgent` subfolder you will find a folder containing all snapshot models of the latest training, as well as a `stats.pdf`, containing graphs that show the training progress. The best performing model is saved in `target/agent/forestrl_singlestep_agent_master/src/forestrl_singlestep_agent_master/emadlcpp/model/forestrl.singlestep.agent.networks.ForestActor` in the two files `model_0_newest-0000.params` and `model_0_newest-symbol.json`.
## Evaluating the agent
To run/evaluate the trained agent, you first need to copy the two files `model_0_newest-0000.params` and `model_0_newest-symbol.json` from the above mentioned location into the `model/forestrl.singlestep.agent.networks.ForestActor` folder in the root directory. There already is a pretrained model inside this directory, which you can use without having to train the agent first. If you want to keep this model you should back it up before replacing the files. If you want to use a model from the snapshot directory instead, you have to rename the files to the two names mentioned above, so that the agent can find them. Once the model is in place, the agent can be evaluated by running
```bash
./evaluate_agent.sh
```
This will start the simulator and the agent and lets the agent complete 100 episodes in the simulator. At the end of the evaluation it will output the average rewards in the terminal.
## The benchmark agents
Two benchmark agents are included in the project, against which the performance of the reinforcement learning agent can be compared: a random agent and a rule-based agent. They both use the postprocessor of the reinforcement learning agent, so the reinforcement learning agent has to be installed before the benchmark agents can be used. To start/evaluate the random agent run
```bash
./evaluate_agent.sh -b random
```
To start/evaluate the rule-based agent run
```bash
./evaluate_agent.sh -b rulebased
```
\ No newline at end of file
import random
import rospy
import thread
import numpy as np
from std_msgs.msg import Float32MultiArray
newjobs = 20
maxjobs = 10
maxsites = 10
## setting up ROS
ros_update_rate = 50 ## 50hz
netvalues_topic = '/postprocessor/action'
netvalues_publisher = rospy.Publisher(netvalues_topic, Float32MultiArray, queue_size=1)
rospy.init_node('RandomActions', anonymous=True)
rate = rospy.Rate(ros_update_rate)
listener = thread.start_new_thread(rospy.spin, ())
## returns an array with random values between 0 and 1
def randomsigmoid(length):
return [random.random() for i in range(0, length)]
## like randomsigmoid, but the values sum to 1 to imitate softmax values
def randomsoftmax(length):
randlist = randomsigmoid(length)
return [randlist[i]/sum(randlist) for i in range(0, length)]
## returns an array with random values between -1 and 1
def randomtanh(length):
return [random.random()*2-1 for i in range(0, length)]
## returns random action values
def genactionvalues():
jobratings = randomtanh(newjobs)
pilepositions = randomtanh(maxjobs*2)
sitevalues = randomtanh(maxsites)
pilevalues = randomtanh(maxjobs)
compacvalues = randomtanh(3)
actionvalues = jobratings + pilepositions + sitevalues + pilevalues + compacvalues
return np.array(actionvalues)
if __name__ == "__main__":
while not rospy.is_shutdown():
randomactions = genactionvalues()
netvalues_publisher.publish(Float32MultiArray(data=randomactions))
rate.sleep()
import rospy
import thread
import time
import numpy as np
from std_msgs.msg import Float32MultiArray
from simconsts import *
class RuleBasedAgent(object):
state_topic = '/sim/state'
netvalues_topic = '/postprocessor/action'
ros_update_rate = 10
def __init__(self):
rospy.init_node('RuleBased', anonymous=True)
self.__listener = thread.start_new_thread(rospy.spin, ())
self.__netvalues_publisher = rospy.Publisher(
RuleBasedAgent.netvalues_topic, Float32MultiArray, queue_size=1)
self.__state_subscriber = rospy.Subscriber(
RuleBasedAgent.state_topic, Float32MultiArray, self.execute)
def execute(self, msg):
state = np.array(msg.data).reshape(7, 32, 32)
jobinf = self.getjobinf(state)
siteinf = self.getsiteinf(state)
distances = self.getdistances(state)
jobratings = self.getjobratings()
pilepositions = self.getpilepositions(jobinf, siteinf)
sitevalues = self.getsitevalues(distances)
pilevalues = self.getpilevalues(jobinf)
compacvalues = self.getcompacvalues()
actionvalues = jobratings + pilepositions + sitevalues + pilevalues + compacvalues
self.__netvalues_publisher.publish(Float32MultiArray(data=np.array(actionvalues)))
def getjobinf(self, state):
return [[state[ST_JOBINF][i][j] for j in range(LEN_JOB)] for i in range(MAXJOBS)]
def getsiteinf(self, state):
return [[state[ST_SITEINF][i][j] for j in range(LEN_SITE)] for i in range(MAXSITES)]
def getdistances(self, state):
return [[state[ST_DIST][i][j] for j in range(LEN_DIST)] for i in range(LEN_DIST)]
def getjobratings(self):
return [1 for i in range(MAXJOBS)] + [-1 for i in range(NEWJOBS-MAXJOBS)]
def getpilepositions(self, jobinf, siteinf):
pilepositions = [-1 for i in range(MAXJOBS*2)]
for i in range(MAXJOBS):
woodtype = jobinf[i][JOB_TYPE]
loglen = jobinf[i][JOB_LOGLEN]
if woodtype > 0 and loglen > 0:
selected = False
for j in range(MAXSITES):
if woodtype == siteinf[j][SITE_TYPE] and loglen == siteinf[j][SITE_LOGLEN] and not selected:
pilepositions[i*2] = (siteinf[j][SITE_SITEX]/MAPSIZE)*2 - 1
pilepositions[i*2+1] = (siteinf[j][SITE_SITEY]/MAPSIZE)*2 - 1
selected = True
return pilepositions
def getsitevalues(self, distances):
sitedist = [distances[i][LEN_DIST-1] for i in range(MAXSITES)] # felling site <-> forwarder distances
maxdist = max(sitedist)
sitevalues = [1 - sitedist[i]/maxdist for i in range(MAXSITES)]
return sitevalues
def getpilevalues(self, jobinf):
deadlines = [jobinf[i][JOB_DEADL] for i in range(MAXJOBS)]
maxdeadl = max(deadlines)
pilevalues = [1 - deadlines[i]/maxdeadl for i in range(MAXJOBS)]
return pilevalues
def getcompacvalues(self):
return [0, 1, 0]
if __name__ == "__main__":
agent = RuleBasedAgent()
while not rospy.is_shutdown():
time.sleep(0.5)
MAXJOBS = 10
NEWJOBS = 20
MAXSITES = 10
MAPSIZE = 32
LEN_JOB = 8
LEN_SITE = 22
LEN_DIST = MAXJOBS + MAXSITES + 1
LEN_FORW = 6
STATE_X = 7
STATE_Y = MAPSIZE
STATE_Z = MAPSIZE
NN_STATE_LEN = 2 + (MAXJOBS+NEWJOBS)*LEN_JOB + MAXSITES*LEN_SITE + LEN_DIST-1 + LEN_FORW ## == 488
ACTION_X = 6
ACTION_Y = NEWJOBS
ACTION_Z = 2
NN_ACTION_LEN = NEWJOBS + MAXJOBS*2 + MAXSITES + MAXJOBS + 3 ## == 63
ST_GENERAL = 1-1
ST_MAP = 2-1
ST_JOBINF = 3-1
ST_NEWJOB = 4-1
ST_SITEINF = 5-1
ST_DIST = 6-1
ST_FORW = 7-1
AC_ACCEPT = 1-1
AC_PILES = 2-1
AC_MOVE = 3-1
AC_LOAD = 4-1
AC_UNLOAD = 5-1
AC_WAIT = 6-1
GEN_TIME = 1-1
GEN_MONEY = 2-1
MAP_FOREST = 1
MAP_ROAD = 2
MAP_TRAIL = 3
MAP_SITE = 4
MAP_PILE = 5
JOB_PILEX = 1-1
JOB_PILEY = 2-1
JOB_TYPE = 3-1
JOB_LOGLEN = 4-1
JOB_DEMAND = 5-1
JOB_DEADL = 6-1
JOB_FINE = 7-1
JOB_REWARD = 8-1
SITE_SITEX = 1-1
SITE_SITEY = 2-1
SITE_TYPE = 3-1
SITE_LOGLEN = 4-1
SITE_SUPPLY = 5-1
SITE_SUPTIME1 = 6-1
SITE_SUPTIME2 = 7-1
SITE_SUPTIME3 = 8-1
SITE_SUPUP1 = 9-1
SITE_SUPUP2 = 10-1
SITE_SUPUP3 = 11-1
SITE_MAXGREEN = 12-1
SITE_MAXYELLOW = 13-1
SITE_SOILTIME1 = 14-1
SITE_SOILTIME2 = 15-1
SITE_SOILTIME3 = 16-1
SITE_GREENUP1 = 17-1
SITE_GREENUP2 = 18-1
SITE_GREENUP3 = 19-1
SITE_YELLOWUP1 = 20-1
SITE_YELLOWUP2 = 21-1
SITE_YELLOWUP3 = 22-1
FORW_X = 1-1
FORW_Y = 2-1
FORW_WEIGHT = 3-1
FORW_LOAD = 4-1
FORW_BUNKCAP = 5-1
FORW_CRANECAP = 6-1
WOOD_TYPES = 3
WT_PINE = 1
WT_BEECH = 2
WT_OAK = 3
KGPM_PINE = 56
KGPM_BEECH = 77
KGPM_OAK = 84
## SIM ONLY
LAYER_NAMES = {1:'GENERAL INFO', 2:'MAP', 3:'JOBINF', 4:'NEWJOBINF', 5:'SITEINF', 6:'DISTANCES', 7:'FORWARDER'}
ACTION_NAMES = ['ACCEPT', 'PILES', 'MOVE', 'LOAD', 'UNLOAD', 'WAIT']
This diff is collapsed.
from dummysim import SimEnv
from simconsts import *
from multistep_simconsts import *
class MultiStepWrapper(object):
def __init__(self):
self.__env = SimEnv()
def get_state(self):
return self.__env.get_state()
def get_time(self):
return self.__env.get_time()
def get_money(self):
return self.__env.get_money()
def terminal(self):
return self.__env.terminal()
def reset(self):
return self.__env.reset()
def findactionlayer(self, action):
layer = -1
activelayers = 0
if action[MS_AC_ACCEPT][0][0] != -1:
layer = MS_AC_ACCEPT
activelayers += 1
if action[MS_AC_PILES][0][0] != -1:
layer = MS_AC_PILES
activelayers += 1
if action[MS_AC_DELIVER][0][0] != -1:
layer = MS_AC_DELIVER
activelayers += 1
if action[MS_AC_WAIT][0][0] != -1:
layer = MS_AC_WAIT
activelayers += 1
if activelayers != 1:
layer = -1
return layer
def get_emptyaction(self):
return [[[-1 for col in range(ACTION_Z)] for row in range(ACTION_Y)] for layer in range(ACTION_X)]
def get_sitepos(self, site):
statearray = self.get_state()
x = statearray[ST_SITEINF][int(site)-1][0]
y = statearray[ST_SITEINF][int(site)-1][1]
return x, y
def get_pilepos(self, pile):
statearray = self.get_state()
x = statearray[ST_JOBINF][int(pile)-1][0]
y = statearray[ST_JOBINF][int(pile)-1][1]
return x, y
def acceptjobs(self, mat):
actionarray = self.get_emptyaction()
actionarray[AC_ACCEPT] = mat
s,t,r = self.__env.step(actionarray)
return s,t,r
def createpiles(self, mat):
actionarray = self.get_emptyaction()
actionarray[AC_PILES] = mat
s,t,r = self.__env.step(actionarray)
return s,t,r
def moveto(self, pos):
actionarray = self.get_emptyaction()
actionarray[AC_MOVE][0][0] = pos[0]
actionarray[AC_MOVE][1][0] = pos[1]
s,t,r = self.__env.step(actionarray)
return s,t,r
def loadlogs(self, lognr):
actionarray = self.get_emptyaction()
actionarray[AC_LOAD][0][0] = lognr
s,t,r = self.__env.step(actionarray)
return s,t,r
def unloadlogs(self, lognr):
actionarray = self.get_emptyaction()
actionarray[AC_UNLOAD][0][0] = lognr
s,t,r = self.__env.step(actionarray)
return s,t,r
def deliver(self, mat):
totalreward = 0
pile = mat[0][0]
site = mat[1][0]
logs = mat[2][0]
pilepos = self.get_pilepos(pile)
sitepos = self.get_sitepos(site)
s,t,r = self.moveto(sitepos)
totalreward += r
s,t,r = self.loadlogs(logs)
totalreward += r
s,t,r = self.moveto(pilepos)
totalreward += r
s,t,r = self.unloadlogs(logs)
totalreward += r
return s,t,totalreward
def waituntil(self, mat):
actionarray = self.get_emptyaction()
actionarray[AC_WAIT] = mat
s,t,r = self.__env.step(actionarray)
return s,t,r
def step(self, actionarray):
layer = self.findactionlayer(actionarray)
if layer != -1:
actionmat = actionarray[layer][:]
else:
actionmat = []
if layer == MS_AC_ACCEPT:
s,t,r = self.acceptjobs(actionmat)
elif layer == MS_AC_PILES:
s,t,r = self.createpiles(actionmat)
elif layer == MS_AC_DELIVER:
s,t,r = self.deliver(actionmat)
elif layer == MS_AC_WAIT:
s,t,r = self.waituntil(actionmat)
else: ## do nothing
s = self.get_state()
t = self.terminal()
r = 0
return s,t,r
# (c) https://github.com/MontiCore/monticore
import signal
import sys
import time
import numpy as np
from multistep_roswrapper import RosSimConnector
from optparse import OptionParser
TRAINING_MODE = 0
PLAY_MODE = 1
EVAL_MODE = 2
def signal_handler(sig, frame):
print('Starting shutdown')
sys.exit(0)
if __name__ == "__main__":
signal.signal(signal.SIGINT, signal_handler)
parser = OptionParser(usage='usage: %prog [options]')
parser.add_option("-v", '--verbose', action="store_true", dest="verbose",
help='show logging information')
parser.add_option("-q", '--quiet', action="store_false", dest="verbose",
help='no output')
parser.add_option('-t', '--training', action='store_true', dest='training',
help='training mode')
parser.add_option('-p', '--play', action='store_true', dest='play',
help='play mode')
parser.add_option('--eval', action='store_true', dest='eval',
help='evaluation mode')
options, args = parser.parse_args()
mode_options = 0
mode = TRAINING_MODE
if options.training:
mode = TRAINING_MODE
mode_options += 1
if options.play:
mode = PLAY_MODE
mode_options += 1
if options.eval:
mode = EVAL_MODE
mode_options += 1
if mode_options > 1:
parser.error('Please select only one mode')
verbose = options.verbose
sample_games = 100
connector = RosSimConnector(verbose, options.eval)
if mode == PLAY_MODE:
time.sleep(8)
connector.reset()
while not connector.is_terminated or connector.in_reset:
time.sleep(1)
elif mode == EVAL_MODE:
print('Start Evaluation Mode')
time.sleep(5)
print('Sample from {} games'.format(sample_games))
total_scores = np.zeros((sample_games,), dtype='float32')
for g in range(sample_games):
connector.reset()
while not connector.is_terminated:
time.sleep(0.5)
total_scores[g] = connector.last_game_score
print('Game {}/{} Score: {}'.format(
g+1, sample_games, total_scores[g]))
print('---- Statistics for {} games played ----'.format(sample_games))