§ feed · storyline
Proximal Policy Optimization
OpenAI releases Proximal Policy Optimization (PPO), a reinforcement learning algorithm that matches or exceeds state-of-the-art performance while being simpler to implement and tune.
We’re releasing a new class of reinforcement learning algorithms, Proximal Policy Optimization (PPO), which perform comparably or better than state-of-the-art approaches while being much simpler to implement and tune.
PPO has become the default reinforcement learning algorithm at OpenAI because of its ease of use and good performance.
§ sources1 publication · timeline below
- openai.comProximal Policy Optimizationprimary