May 15, 2023, 11:26 p.m. | USENIX

USENIX www.youtube.com

Bamboo: Making Preemptible Instances Resilient for Affordable Training of Large DNNs

John Thorpe, Pengzhan Zhao, Jonathan Eyolfson, and Yifan Qiao, UCLA; Zhihao Jia, CMU; Minjia Zhang, Microsoft Research; Ravi Netravali, Princeton University; Guoqing Harry Xu, UCLA

DNN models across many domains continue to grow in size, resulting in high resource requirements for effective training, and unpalatable (and often unaffordable) costs for organizations and research labs across scales. This paper aims to significantly reduce training costs with effective use of preemptible …

bamboo continue domains john large making microsoft research size training university

Assistant Manager, IT Security

@ CIMB | Cambodia

IT Security Engineer - GRC

@ Xtremax | Bandung City, West Java, Indonesia

Senior Engineer - Application Security

@ ANZ Banking Group Limited | Quezon City, PH

Penetration Tester Manager

@ RSM | USA-IL-Chicago-30 South Wacker Drive, Suite 3300

Offensive Security Engineer, Device Wireless Connectivity

@ Google | Amsterdam, Netherlands

IT Security Analyst I

@ Mitsubishi Heavy Industries | Houston, TX, US, 77046