




Summary: Join GoDaddy's Monitoring and Observability team as a Site Reliability Engineer to ensure the reliability, performance, and availability of infrastructure serving millions of customers. Highlights: 1. Help ensure reliability, performance, and availability of global infrastructure 2. Design, deploy, and maintain observability and monitoring platforms 3. Automate operational processes and build self-service tooling Location Details: This is a remote position, so you’ll be working remotely from your home. You may occasionally visit a GoDaddy office to meet with your team for events or meetings. Join Our Team GoDaddy is looking for a Site Reliability Engineer to join our Monitoring and Observability team. In this role, you’ll help ensure the reliability, performance, and availability of infrastructure that serves millions of customers worldwide. You’ll work at the intersection of development and operations to build and maintain observability solutions that enable proactive monitoring and rapid incident response across cloud and on-prem systems. What you’ll get to do... Design, deploy, and maintain observability and monitoring platforms using Python, including metrics, logging, tracing, and visualization tools e.g., Prometheus, Grafana. Automate operational processes and build self-service tooling to improve reliability and reduce manual effort. Respond to production incidents, participate in on-call rotations, and collaborate with teams to resolve performance, availability, and security issues. Support CI/CD pipelines and configuration management for monitoring and observability infrastructure. Your experience should include... 3 years of professional experience designing, building, and operating large-scale infrastructure as a Site Reliability Engineer, DevOps, or similar role. 3 years of deep expertise in Linux/Unix systems, including performance tuning, kernel-level troubleshooting, and systems optimization. 3 years building observability platforms using Python, Go a plus, with experience in Prometheus and related tools. 3 years with configuration management Ansible, Puppet, etc., scripting Python, Go, Bash, or JavaScript, and event/incident management platforms. 2 years of containerization and orchestration experience. 2 years of professional experience designing and executing incident response workflows, and developing, maintaining, and optimizing CI/CD pipelines for production-grade systems. -Requerimientos- Educación mínima: Universidad / Carrera técnica 3 años de experiencia Idiomas: Inglés Conocimientos: Java Palabras clave: ingeniero, engineers, ingeniera, ing, engineer
