Safety-constrained reinforcement learning

Anu panday

Abstract


Expanding the use of reinforcement learning in the real world requires safety. Since it avoids the challenge of finding a fair balance between safety and performance, common practise is to model the safety elements using a safety-cost signal distinct from the reward and bounding the predicted safety-cost. Setting restrictions just for the expectation and ignoring the tail of the distribution, which can include values that are too big, might be dangerous. In order to accomplish risk control, we provide a technique in this study called Worst-Case Soft Actor Critic for safe RL. This technique roughly approximates the distribution of cumulative safety-costs.

References


Whig, P., & Ahmad, S. N. (2014). Development of economical ASIC for PCS for water quality monitoring. Journal of Circuits, Systems and Computers, 23(06), 1450079.

Gupta, K., & Jiwani, N. (2020). Effects of COVID-19 risk controls on the Global Supply Chain. Transactions on Latest Trends in Artificial Intelligence, 1(1).

Gessner, Guy H., Volonino, Linda (2005). Quick Response Improves Returns on Business Intelligence Investments, Information Systems Management, 22(3), 66 -74.

Whig, P. and Naseem Ahmad, S. (2014), "Simulation of linear dynamic macro model of photo catalytic sensor in SPICE", COMPEL - The international journal for computation and mathematics in electrical and electronic engineering, Vol. 33 No. 1/2, pp. 611-629. https://doi.org/10.1108/COMPEL-09-2012-0160

Ghoshal, S., Kim, S. K. (1986). Building Effective Intelligence Systems for Competitive Advantage, Sloan Management Review, 28(1), 49–58.

Verma, T., Gupta, P., & Whig, P. (2015). Sensor Controlled Sanitizer Door Knob with Scan Technique. In Emerging ICT for Bridging the Future-Proceedings of the 49th Annual Convention of the Computer Society of India CSI Volume 2 (pp. 261-266). Springer, Cham.


Refbacks

  • There are currently no refbacks.