Jan. 30, 2023, 10:43 p.m. | USENIX

USENIX www.youtube.com

How to Not Destroy Your Production Kubernetes Clusters

Qian Ding, Ant Group

This talk presents our real production incident stories when managing hundreds of Kubernetes clusters, particularly when a single cluster scales to 10K+ nodes. We demonstrate that Kubernetes in production can be fragile if not operated skillfully. These operations can be as simple as adding a single node into the cluster or modifying a configmap used by the API server. Yet, the chain reactions of such operations may end …

ant ant group apac cluster clusters destroy incident kubernetes kubernetes clusters node nodes operations simple single stories

SOC 2 Manager, Audit and Certification

@ Deloitte | US and CA Multiple Locations

Information Security Engineers

@ D. E. Shaw Research | New York City

Staff DFIR Investigator

@ SentinelOne | United States - Remote

Senior Consultant.e (H/F) - Product & Industrial Cybersecurity

@ Wavestone | Puteaux, France

Information Security Analyst

@ StarCompliance | York, United Kingdom, Hybrid

Senior Cyber Security Analyst (IAM)

@ New York Power Authority | White Plains, US