Bibliography

Achiam, Joshua, David Held, Aviv Tamar, and Pieter Abbeel. 2017. “Constrained Policy Optimization.” In International Conference on Machine Learning, 22–31. PMLR.
Finn, Chelsea, Sergey Levine, and Pieter Abbeel. 2016. “Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization.” In International Conference on Machine Learning, 49–58. PMLR.
Ng, Andrew Y, Daishi Harada, and Stuart Russell. 1999. “Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping.” In Icml, 99:278–87. Citeseer.
Polyanskiy, Yury, and Yihong Wu. 2025. Information Theory: From Coding to Learning. Cambridge university press.
Rajeswaran, Aravind, Chelsea Finn, Sham M Kakade, and Sergey Levine. 2019. “Meta-Learning with Implicit Gradients.” Advances in Neural Information Processing Systems 32.