March 21, 2024, 4:11 a.m. | Chen Gong, Zhou Yang, Yunpeng Bai, Junda He, Jieke Shi, Kecen Li, Arunesh Sinha, Bowen Xu, Xinwen Hou, David Lo, Tianhao Wang

cs.CR updates on arXiv.org arxiv.org

arXiv:2210.04688v5 Announce Type: replace-cross
Abstract: Reinforcement learning (RL) makes an agent learn from trial-and-error experiences gathered during the interaction with the environment. Recently, offline RL has become a popular RL paradigm because it saves the interactions with environments. In offline RL, data providers share large pre-collected datasets, and others can train high-quality agents without interacting with the environments. This paradigm has demonstrated effectiveness in critical tasks like robot control, autonomous driving, etc. However, less attention is paid to investigating the …

agent arxiv backdoors baffle can cs.ai cs.cr cs.lg data datasets environment environments error experiences high large learn offline paradigm popular quality share train trial

Senior Security Specialist, Forsah Technical and Vocational Education and Training (Forsah TVET) (NEW)

@ IREX | Ramallah, West Bank, Palestinian National Authority

Consultant(e) Junior Cybersécurité

@ Sia Partners | Paris, France

Senior Network Security Engineer

@ NielsenIQ | Mexico City, Mexico

Senior Consultant, Payment Intelligence

@ Visa | Washington, DC, United States

Corporate Counsel, Compliance

@ Okta | San Francisco, CA; Bellevue, WA; Chicago, IL; New York City; Washington, DC; Austin, TX

Security Operations Engineer

@ Samsara | Remote - US