···
Log in / Register
Senior DevOps Engineer
Indeed
Full-time
Onsite
No experience limit
No degree limit
79Q22222+22
Favourites
Share
Description

Summary: As a Senior DevOps Engineer, you will build scalable, GPU-ready Kubernetes platforms for AI and research, focusing on orchestration, performance, and automating workflows in a client-facing setup. Highlights: 1. Build scalable, GPU-ready Kubernetes platforms for AI and research workloads 2. Administer Kubernetes end-to-end and implement Volcano job scheduling 3. Automate workflows with Python and UNIX shell scripting in a client-facing setup We are building scalable, GPU\-ready Kubernetes platforms for AI and research workloads, focusing on reliable orchestration and performance. As a Senior DevOps Engineer, you will operate Kubernetes and Linux compute environments, run Volcano scheduling, and automate workflows with Python and UNIX shell scripting in a client\-facing delivery setup. Apply now to help deliver efficient compute at scale **Responsibilities** * Deploy, configure, and sustain GPU\-enabled Kubernetes clusters and standalone Linux compute environments to maximize scheduling efficiency and performance * Implement and operate Volcano job scheduling, including queue setup, POD execution, GPU allocation, and namespace quota enforcement * Administer Kubernetes end\-to\-end, covering namespaces, RBAC, resource quotas, and workload isolation approaches * Create and maintain Python and Shell automation to simplify job submission, resource provisioning, and system reporting * Collaborate with orchestration, optimization, and observability teams to improve scheduling efficiency, capacity utilization, and researcher workflows * Monitor platform health and resource utilization, sharing data and feedback to support optimization and reporting needs * Recommend and drive enhancements to infrastructure, tooling, and automation workflows to improve performance, scalability, and usability * Ensure operations provide a smooth and efficient experience for researchers across diverse AI and computational workloads **Requirements** * Minimum 3 years of experience in DevOps or infrastructure engineering roles within complex, large\-scale environments * Expert\-level Kubernetes administration knowledge, including namespaces, POD scheduling/distribution, PVC, NFS, and resource quota management * Hands\-on experience with Volcano scheduler for GPU job execution, queue configuration, workload prioritization, and Kubernetes integration * Demonstrated experience running GPU cluster environments in Kubernetes and on standalone Linux compute nodes * Advanced Python scripting skills for infrastructure automation, plus proficiency in UNIX Shell scripting (e.g., Bash) * Strong Linux system administration capability, including troubleshooting, performance tuning, and configuration management * Solid understanding of infrastructure automation and orchestration concepts and supporting tooling * Fluent English communication skills (spoken and written) for direct client interaction **Nice to have** * Helm for Kubernetes application packaging and releases * Monitoring and observability tooling, especially Prometheus, Grafana, and Loki * Infrastructure as Code tools such as Terraform * Multi\-cloud Kubernetes exposure (Amazon EKS, Google GKE) * Azure Networking knowledge including VPN, ExpressRoute, and network security * Familiarity with AI\-assisted coding tools (e.g., GitHub Copilot, ChatGPT, Claude) * Experience with hybrid (cloud \+ on\-premises) scheduling and resource optimization

Source:  indeed View original post
Valentina Rodríguez
Indeed · HR

Company

Indeed
Valentina Rodríguez
Indeed · HR
Similar jobs

Cookie
Cookie Settings
Our Apps
Download
Download on the
APP Store
Download
Get it on
Google Play
© 2025 Servanan International Pte. Ltd.