Categorías
···
Entrar / Registro
Senior DevOps Engineer
Indeed
Tiempo completo
Presencial
Sin requisito de experiencia
Sin requisito de título
79Q22222+22
Favoritos
Compartir
Parte del contenido se ha traducido automáticamenteVer original
Descripción

Summary: This Senior DevOps Engineer will own Kubernetes administration, implement Volcano queues, and automate operations to optimize shared compute environments for AI and research teams. Highlights: 1. Own Kubernetes administration for AI and research teams 2. Implement Volcano queues and policies for GPU workload scheduling 3. Automate operations with Python and UNIX shell scripting We are operating Kubernetes and Linux GPU infrastructure for AI and research teams, emphasizing automation, scheduling accuracy, and reliability at scale. In this Senior DevOps Engineer position, you will own Kubernetes administration, implement Volcano queues and policies, and automate day\-to\-day operations with Python and UNIX shell scripting. Apply now to help optimize shared compute environments **Responsibilities** * Implement and maintain GPU\-enabled Kubernetes clusters and standalone Linux compute environments to support reliable workload scheduling and performance * Configure and run Volcano job scheduling, including queue setup, POD execution, GPU allocation, and namespace quota enforcement * Manage Kubernetes environments end\-to\-end, including namespaces, RBAC, resource quotas, and workload isolation strategies * Automate job submission, resource provisioning, and system reporting by developing Python and Shell scripts * Collaborate with orchestration, optimization, and observability teams to improve scheduling efficiency, capacity utilization, and researcher workflows * Observe infrastructure health and resource utilization, supplying data to meet optimization and reporting requirements * Improve infrastructure, tooling, and automation workflows to boost performance, scalability, and usability * Support operational processes that ensure a seamless experience for researchers running diverse AI and computational workloads **Requirements** * Minimum 3 years of experience in DevOps or infrastructure engineering within complex, large\-scale environments * Expert proficiency in Kubernetes administration and orchestration, including namespaces, POD scheduling/distribution, PVC, NFS, and resource quota management * Hands\-on Volcano experience for GPU job execution, queue configuration, workload prioritization, and integration with Kubernetes * Proven experience managing GPU cluster environments in Kubernetes and on standalone Linux compute nodes * Advanced Python scripting skills for infrastructure automation along with proficiency in UNIX Shell scripting (e.g., Bash) * Strong Linux system administration skills, including troubleshooting, performance tuning, and configuration management * Solid understanding of infrastructure automation and orchestration concepts and tooling * Fluent English communication skills (spoken and written) for direct client interaction **Nice to have** * Helm for Kubernetes package management * Prometheus, Grafana, and Loki for monitoring and observability * Terraform for Infrastructure as Code * Multi\-cloud Kubernetes background with Amazon EKS and Google GKE * Azure Networking knowledge including VPN, ExpressRoute, and network security * Experience with AI\-assisted coding tools (e.g., GitHub Copilot, ChatGPT, Claude) * Hybrid (cloud \+ on\-premises) scheduling and resource optimization exposure

Fuentea:  indeed Ver publicación original
Valentina Rodríguez
Indeed · HR

Compañía

Indeed
Valentina Rodríguez
Indeed · HR
Empleos similares

Cookie
Configuración de cookies
Nuestras aplicaciones
Download
Descargar en
APP Store
Download
Consíguelo en
Google Play
© 2025 Servanan International Pte. Ltd.