Feb. 9, 2024, 5:10 a.m. | Alexandre Rio Merwan Barlier Igor Colin Albert Thomas

cs.CR updates on arXiv.org arxiv.org

We address offline reinforcement learning with privacy guarantees, where the goal is to train a policy that is differentially private with respect to individual trajectories in the dataset. To achieve this, we introduce DP-MORL, an MBRL algorithm coming with differential privacy guarantees. A private model of the environment is first learned from offline data using DP-FedAvg, a training method for neural networks that provides differential privacy guarantees at the trajectory level. Then, we use model-based policy optimization to derive a …

address algorithm coming cs.ai cs.cr cs.lg data dataset differential privacy environment goal offline policy privacy private respect stat.ml train

Director of the Air Force Cyber Technical Center of Excellence (CyTCoE)

@ Air Force Institute of Technology | Dayton, OH, USA

Senior Cyber Security Analyst

@ Valley Water | San Jose, CA

Business Information Security Officer

@ PwC | Auckland - PwC Tower

CI/CD DevSecOps Developer (Remote)

@ NTT DATA | Halifax, NS, CA

Security Operations Engineer

@ Collectors | Santa Ana, California, United States

Security Engineer

@ Wizeline | Colombia