···
Log in / Register
Senior DevOps Engineer
Indeed
Full-time
Onsite
No experience limit
No degree limit
79Q22222+22
Favourites
Share
Description

Summary: Seeking a Senior DevOps Engineer to standardize automation and scheduling performance by administering Kubernetes with Volcano, managing quotas, and automating operations for advanced AI and research work. Highlights: 1. Strengthen GPU-capable orchestration on Kubernetes and Linux 2. Administer Kubernetes with Volcano for advanced AI and research 3. Drive continuous improvements to infrastructure and automation We are strengthening GPU\-capable orchestration on Kubernetes and Linux, and need a Senior DevOps Engineer to standardize automation and scheduling performance. You will administer Kubernetes with Volcano, manage quotas and isolation, and automate operations using Python and Bash to support advanced AI and research work. Send your application to get started **Responsibilities** * Provision, configure, and support GPU\-enabled Kubernetes clusters and standalone Linux compute environments to keep scheduling and performance at peak * Operate Volcano job scheduling, handling queue setup, POD execution, GPU allocation, and namespace quota enforcement * Own Kubernetes administration end\-to\-end, including namespaces, RBAC, resource quotas, and workload isolation strategies * Automate job submission, resource provisioning, and reporting through Python and Shell scripting maintained over time * Coordinate with orchestration, optimization, and observability teams to enhance scheduling efficiency, capacity utilization, and researcher workflows * Observe infrastructure health and resource consumption, and share data for optimization and reporting requirements * Drive continuous improvements to infrastructure, tooling, and automation workflows to boost performance, scalability, and usability * Support operational processes that ensure researchers have an efficient experience across diverse AI and computational workloads **Requirements** * 3\+ years of DevOps or infrastructure engineering experience in large, complex environments * Expert proficiency administering Kubernetes, including namespaces, POD scheduling/distribution, PVC, NFS, and resource quota management * Hands\-on background with Volcano scheduler for GPU jobs, including queue setup and workload prioritization with Kubernetes integration * Track record of managing GPU cluster environments both in Kubernetes and on standalone Linux compute nodes * Advanced capability with Python for infrastructure automation and solid UNIX Shell scripting such as Bash * Strong Linux system administration skills with troubleshooting, performance tuning, and configuration management experience * Solid understanding of infrastructure automation and orchestration concepts and the tools used to implement them * Fluent English communication skills (spoken and written) to support direct client collaboration **Nice to have** * Helm knowledge for packaging and managing Kubernetes applications * Experience with monitoring and observability stacks, especially Prometheus, Grafana and Loki * Familiarity with Infrastructure as Code, including Terraform * Exposure to multi\-cloud Kubernetes environments such as Amazon EKS and Google GKE * Understanding of Azure Networking, including VPN, ExpressRoute and network security * Experience using AI\-assisted coding tools like GitHub Copilot, ChatGPT and Claude * Knowledge of hybrid (cloud and on\-premises) scheduling and resource optimization approaches

Source:  indeed View original post
Valentina Rodríguez
Indeed · HR

Company

Indeed
Valentina Rodríguez
Indeed · HR
Similar jobs

Cookie
Cookie Settings
Our Apps
Download
Download on the
APP Store
Download
Get it on
Google Play
© 2025 Servanan International Pte. Ltd.