Welcome to the FSRL documentation!¶

The FSRL (Fast Safe Reinforcement Learning) package contains modularized implementations of safe RL algorithms based on PyTorch and the Tianshou framework [Weng et al., 2022]. The implemented safe RL algorithms include:

PPOLagrangian: PPO with PID Lagrangian, on-policy algorithm, [Stooke et al., 2020]
TRPOLagrangian: TRPO with PID Lagrangian, on-policy algorithm, [Stooke et al., 2020]
SACLagrangian: SAC with PID Lagrangian, off-policy algorithm with on-policy Lagrangian, [Stooke et al., 2020]
DDPGLagrangian: DDPG with PID Lagrangian, off-policy algorithm with on-policy Lagrangian, [Stooke et al., 2020]
CVPO Constrained Varitional Policy Optimization, off-policy algorithm, [Liu et al., 2022]
CPO Constrained Policy Optimization, on-policy algorithm, [Achiam et al., 2017]
FOCOPS First Order Constrained Optimization in Policy Space, on-policy algorithm, [Zhang et al., 2020]

The implemented algorithms are well-tuned for many tasks in the following safe RL environments, which cover most tasks in safe RL papers:

BulletSafetyGym, FSRL will install this environment by default as the testing ground.
SafetyGymnasium, note that you need to install it from the source because we use the gymnasium API.

FSRL cares about implementation and hyper-parameters, as both of them play a crucial role in successfully training a safe RL agent.

For instance, the CPO method fails to satisfy constraints based on the SafetyGym benchmark results and their implementations. As a result, many safe RL papers that adopt these implementations may also report failure results. However, we discovered that with appropriate hyper-parameters and implementation, it can achieve good safety performance in most tasks as well.
Another example is the off-policy Lagrangian methods: SACLagrangian, DDPGLagrangian. While they may fail with off-policy style Lagrangian multiplier updates [Liu et al., 2022], they can achieve sample-efficient training and good performance with on-policy style Lagrange updates.
Therefore, we plan to provide a practical guide for tuning the key hyper-parameters of safe RL algorithms, which empirically summarize the their effects on the performance.

FSRL cares about the training speed, with the aim to accelerate the experimentation and benchmarking process.

For example, most algorithms can solve the SafetyCarRun-v0 task in 2 minutes and the SafetyCarCircle-v0 task in 10 minutes with 4 cpus. The CVPO algorithm implementation can also achieve 5x faster training than the original repo.
We also plan to provide a guide regarding how to accelerate your safe RL experiments.

Here are FSRL’s other features:

Elegant framework with modularized implementation, which are mostly the same as Tianshou.
State-of-the-art benchmark performance on popular safe RL tasks.
Support fast vectorized environment parallel sampling for all algorithms.
Support n-step returns estimation compute_nstep_returns(); GAE and nstep are very fast thanks to numba jit function and vectorized numpy operation.
Support both TensorBoard and W&B log tools with customized easy-to-use features.

Checkout the Get Started page for more information and start your journey with FSRL!

Tutorials

API Docs

Community

Indices and tables¶

References

[AHTA17]

Joshua Achiam, David Held, Aviv Tamar, and Pieter Abbeel. Constrained policy optimization. In International conference on machine learning, 22–31. PMLR, 2017.

[LCI+22] (1,2)

Zuxin Liu, Zhepeng Cen, Vladislav Isenbaev, Wei Liu, Steven Wu, Bo Li, and Ding Zhao. Constrained variational policy optimization for safe reinforcement learning. In International Conference on Machine Learning, 13644–13668. PMLR, 2022.

[SAA20] (1,2,3,4)

Adam Stooke, Joshua Achiam, and Pieter Abbeel. Responsive safety in reinforcement learning by pid lagrangian methods. In International Conference on Machine Learning, 9133–9143. PMLR, 2020.

[WCY+22]

Jiayi Weng, Huayu Chen, Dong Yan, Kaichao You, Alexis Duburcq, Minghao Zhang, Yi Su, Hang Su, and Jun Zhu. Tianshou: a highly modularized deep reinforcement learning library. Journal of Machine Learning Research, 23(267):1–6, 2022. URL: http://jmlr.org/papers/v23/21-1127.html.

[ZVR20]

Yiming Zhang, Quan Vuong, and Keith Ross. First order constrained optimization in policy space. Advances in Neural Information Processing Systems, 33:15338–15349, 2020.