




Summary: This role supports reliability and scalability across AWS, Azure, GCP, and Oracle through automation, CI/CD, observability, and container orchestration, ensuring stable and continuously improving production systems. Highlights: 1. Support reliability and scalability across multiple cloud platforms. 2. Work with cutting-edge technologies: CI/CD, Observability, Containers, IaC. 3. Collaborate with senior engineers on continuous platform improvements. **Technical Summary** You will support the reliability and scalability of services across AWS, Azure, GCP, and Oracle by executing automation, CI/CD, observability, and container orchestration tasks. You will work closely with senior engineers to ensure production systems are stable, well\-monitored, and continuously improving. **Responsibilities** * Implement and maintain monitoring, alerting, and logging systems (Prometheus, Grafana, ELK, OpenTelemetry) * Build and maintain CI/CD pipelines and automation for deployments and testing * Support containerized workloads using Docker and Kubernetes; manage Helm charts and deployments * Contribute to incident response, troubleshooting, and postmortem documentation * Implement IaC patterns (Terraform, CloudFormation, ARM templates) under guidance * Collaborate with developers to improve service reliability and operational readiness * Participate in continuous platform improvements led by senior/principal engineers **Must\-have Qualifications** * 3–5 years of experience in operations, DevOps, or SRE roles * Hands\-on experience with containers and orchestration (Docker, Kubernetes) * Familiarity with IaC tools (Terraform, Ansible, or similar) * Experience with CI/CD tools (Jenkins, GitHub Actions, ArgoCD, or similar) * Proficiency in at least one scripting language (Python, Bash, Go) * Associate Level Cloud Certification (AWS, Azure, GCP, Oracle, Cloud\+) * This position requires availability for weekend and holiday shifts as part of the standard scheduling rotation **Nice\-to\-have Skills** * Exposure to SLOs/SLIs and error budgets * Familiarity with chaos testing or service mesh


