[R] Supercharging reinforcement learning with logic

[ad_1]

Deep reinforcement learning has led to a variety of compelling results. However, performance issues, particularly relating to the data efficiency of simulation has limited it applicability in domains where simulations run more slowly. Our solution is to use a logic base framework, PyReason, as a proxy for the simulation.

https://preview.redd.it/kdhpu9qraaub1.png?width=1786&format=png&auto=webp&s=8155ba38fc66bd3a2fe934b1f395351c4db68e2f

We showed that inference with PyReason logic program can provide up to a three order-of-magnitude speedup when compared with native simulations (we studied AFSIM and Starcraft2) while providing comparable reward and win rate (we found that PyReason-trained agents actually performed better than expected in both AFSIM and Starcraft2).

https://preview.redd.it/7mfh7pusaaub1.png?width=1636&format=png&auto=webp&s=fcccd22aad08a003f42bd05dd37c6eb42eabbbd8

However, the benefits of our semantic proxy go well beyond performance. The use of temporal logic programming has two crucial beneficial by-products such as symbolic explainability and modularity. PyReason provides an explainable symbolic trace that captures the evolution of the environment in a precise manner while modularity allows us to add or remove aspects of the logic program – allowing for adjustments to the simulation based on a library of behaviors. PyReason is well-suited to model simulated environments for other reasons – namely the ability to directly capture non-Markovian relationships and the open-world nature (defaults are “uncertain” instead of true or false). We have demonstrated that agents can be trained using standard RL techniques such as DQN using this framework.

Preprint: https://arxiv.org/abs/2310.06835

Video: https://youtu.be/9e6ZHJEJzgw

Code for PyReason-as-a-Sim (integration with DQN): https://github.com/lab-v2/pyreason-rl-sim

Code for PyReason Gym: https://github.com/lab-v2/pyreason-gym

PyReason home: https://neurosymbolic.asu.edu/pyreason/

submitted by /u/Neurosymbolic
[comments]

[ad_2]

Source link