Abstract

This survey presents an overview of the current model-free deep reinforcement learning landscape. It provides a comparison of state-of-the-art on-policy and off-policy algorithms in the value-based and policy-based domain. Influences and possible drawbacks of different algorithmic approaches are analyzed and associated with new improvements in order to overcome previous problems. Further, the survey shows application scenarios for difficult domains, including the game of Go, Starcraft II, Dota 2, and the Rubik’s Cube.