Skip to content
GitLab
Menu
Projects
Groups
Snippets
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Sign in
Toggle navigation
Menu
Open sidebar
monticore
EmbeddedMontiArc
generators
CNNArch2Gluon
Commits
ed0c66c8
Commit
ed0c66c8
authored
Jul 16, 2019
by
Nicola Gatto
Browse files
Add average Q-values for TD3 algorithm
parent
c5657cbd
Pipeline
#160955
failed with stages
in 45 seconds
Changes
1
Pipelines
1
Hide whitespace changes
Inline
Side-by-side
src/main/resources/templates/gluon/reinforcement/agent/Agent.ftl
View file @
ed0c66c8
...
...
@@ -916,6 +916,9 @@ class TwinDelayedDdpgAgent(DdpgAgent):
if self._total_steps % self._policy_delay == 0:
tmp_critic = self._copy_critic()
episode_avg_q_value +=\
np.sum(tmp_critic(
states, self._actor(states)).asnumpy()) / self._minibatch_size
with autograd.record():
actor_loss = -tmp_critic(
states, self._actor(states)).mean()
...
...
@@ -942,7 +945,6 @@ class TwinDelayedDdpgAgent(DdpgAgent):
np.sum(critic_loss.asnumpy()) / self._minibatch_size
episode_actor_loss += 0 if actor_updates == 0 else\
np.sum(actor_loss.asnumpy()[0])
episode_avg_q_value = 0
training_steps += 1
...
...
@@ -961,8 +963,8 @@ class TwinDelayedDdpgAgent(DdpgAgent):
else (episode_actor_loss / actor_updates)
episode_critic_loss = 0 if training_steps == 0\
else (episode_critic_loss / training_steps)
episode_avg_q_value = 0 if
training_s
te
p
s == 0\
else (episode_avg_q_value /
training_s
te
p
s)
episode_avg_q_value = 0 if
actor_upda
tes == 0\
else (episode_avg_q_value /
actor_upda
tes)
avg_reward = self._training_stats.log_episode(
self._current_episode, start, training_steps,
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment