Visually, the performance of the agent was non-optimal. However, after 2.5 million cycles of interaction, the agent had managed to learn a number of important concepts. It knows not to run into walls. It knows how to seek out food from the limited information provided by its sensors. It knows how to run away and avoid chasing ghosts.
