mpo maxWe introduce a new algorithm for reinforcement learning called Maximum a-posteriori Policy Optimisation (MPO) based on coordinate ascent on a relative-entropyMpo Max is on . Join to connect with Mpo Max and others you may know. gives people the power to share and makes the world more