Deep reinforcement learning has led to a variety of compelling results. However, performance issues, particularly relating to the data efficiency of simulation has limited it applicability in domains where simulations run more slowly. Our solution is to use a logic base framework, PyReason, as a proxy for the simulation.
We showed that inference with PyReason logic program can provide up to a three order-of-magnitude speedup when compared with native simulations (we studied AFSIM and Starcraft2) while providing comparable reward and win rate (we found that PyReason-trained agents actually performed better than expected in both AFSIM and Starcraft2).
However, the benefits of our semantic proxy go well beyond performance. The use of temporal logic programming has two crucial beneficial by-products such as symbolic explainability and modularity. PyReason provides an explainable symbolic trace that captures the evolution of the environment in a precise manner while modularity allows us to add or remove aspects of the logic program – allowing for adjustments to the simulation based on a library of behaviors. PyReason is well-suited to model simulated environments for other reasons – namely the ability to directly capture non-Markovian relationships and the open-world nature (defaults are “uncertain” instead of true or false). We have demonstrated that agents can be trained using standard RL techniques such as DQN using this framework.
Code for PyReason-as-a-Sim (integration with DQN): https://github.com/lab-v2/pyreason-rl-sim
Code for PyReason Gym: https://github.com/lab-v2/pyreason-gym
PyReason home: https://neurosymbolic.asu.edu/pyreason/