···
Log in / Register
Senior DevOps Engineer
Indeed
Full-time
Onsite
No experience limit
No degree limit
79Q22222+22
Favourites
Share
Description

Summary: Seeking a Senior DevOps Engineer to build and operate scalable Kubernetes and Linux compute foundations for GPU-heavy workloads, ensuring reliability and speed through automation and comprehensive management. Highlights: 1. Build and operate GPU-enabled Kubernetes clusters for demanding workloads 2. Automate workflows using Python and Shell scripting in a client-facing setup 3. Drive enhancements to infrastructure, tooling, and automation workflows We are delivering scalable Kubernetes and Linux compute foundations for GPU\-heavy workloads, and a Senior DevOps Engineer will help keep them reliable and fast. You will manage Kubernetes and Volcano scheduling, enforce quotas, and automate workflows using Python and UNIX Shell scripting in a client\-facing delivery setup. Apply now to join the team **Responsibilities** * Build, configure, and operate GPU\-enabled Kubernetes clusters and standalone Linux compute environments to maximize workload scheduling and performance * Run Volcano scheduling end\-to\-end, including queue creation, POD execution, GPU assignment, and enforcing namespace quotas * Manage Kubernetes environments comprehensively, including namespaces, RBAC, resource quotas, and workload isolation approaches * Create and support automation scripts in Python and Shell to streamline job submission, provisioning, and reporting * Partner with orchestration, optimization, and observability teams to improve scheduling efficiency, capacity utilization, and researcher workflows * Track infrastructure health and resource utilization, and provide data to support optimization and reporting needs * Recommend and drive enhancements to infrastructure, tooling, and automation workflows to improve performance, scalability, and usability * Maintain operational processes that enable a seamless and efficient researcher experience across AI and computational workloads **Requirements** * Minimum 3 years of experience in DevOps or infrastructure engineering roles within complex, large\-scale environments * Deep expertise in Kubernetes administration and orchestration, including namespaces, POD scheduling/distribution, PVC, NFS, and resource quota management * Practical experience using Volcano for GPU job execution, queue configuration, and workload prioritization integrated with Kubernetes * Demonstrated experience running GPU cluster environments in Kubernetes and on standalone Linux compute nodes * Advanced skills in Python scripting for infrastructure automation and strong UNIX Shell scripting such as Bash * Strong Linux administration knowledge, including troubleshooting, performance tuning, and configuration management * Good command of infrastructure automation and orchestration concepts and related tooling * Fluent English communication skills (spoken and written) to work directly with clients **Nice to have** * Working knowledge of Helm for Kubernetes application packaging * Experience with observability tooling such as Prometheus, Grafana and Loki * Exposure to Infrastructure as Code tooling, including Terraform * Familiarity with multi\-cloud Kubernetes options such as Amazon EKS and Google GKE * Knowledge of Azure Networking, including VPN, ExpressRoute and network security * Comfort with AI\-assisted coding tools like GitHub Copilot, ChatGPT and Claude * Understanding of hybrid (cloud and on\-premises) scheduling and resource optimization

Source:  indeed View original post
Valentina Rodríguez
Indeed · HR

Company

Indeed
Valentina Rodríguez
Indeed · HR
Similar jobs

Cookie
Cookie Settings
Our Apps
Download
Download on the
APP Store
Download
Get it on
Google Play
© 2025 Servanan International Pte. Ltd.