-
-
pi_scratchb61332b1 · ·
working: training of agent with policy improvement by mcts from scratch. no pretraining with stb3. https://wandb.ai/marcoke/neural_mcts/runs/x2z51l4r/overview?workspace=user-marcoke
-
working_policy_improvement6df646b0 · ·
With this code, MCTS improves a mediocre learned policy. It fails to improve a good policy, however.