




We are looking for a skilled **Senior Site Reliability Engineer** to join our team supporting EPAM's Compute Managed Services project. In this role, you will ensure operational stability and 24x7 monitoring across multi\-cloud environments, drive automation and observability improvements, and collaborate with cross\-functional teams to deliver reliable compute services. If you are passionate about maintaining high\-quality cloud platforms and enjoy working in a dynamic environment, we encourage you to apply. *We accept* *CVs in English only.* **Responsibilities** * Perform 24x7 monitoring of compute platforms using tools such as ELK and PagerDuty * Manage incidents and problems across servers, middleware, operating systems, and cloud platforms including troubleshooting, root cause analysis, and resolution * Execute repaving activities, change management, and disaster recovery procedures * Ensure security and vulnerability compliance including user management and certificate lifecycle oversight * Handle service requests, configuration updates, and prepare audit\-related data extracts * Develop and maintain Standard Operating Procedures for infrastructure operations * Collaborate with teams to implement cell\-based automation and continuous service improvements * Drive observability enhancements and automate operational processes * Maintain compliance with security standards and best practices * Support on\-call duties and provide operational support overlapping US, UK, and AU business hours as required **Requirements** * Experience of 3\+ years in cloud platforms including GCP, AWS, and Azure * Proficient in operating system administration for Windows and Linux environments * Strong knowledge of automation tools such as Ansible, Terraform, Python, and Bash scripting * Experience with observability tools like ELK Stack and Grafana * Familiarity with incident management processes and root cause analysis * Knowledge of security hardening, vulnerability management, and compliance requirements * Excellent problem\-solving and analytical skills * Effective communication and collaboration skills * Experience with disaster recovery and operational recovery processes * Upper\-Intermediate English language proficiency (B2\) **We offer** * Learning Culture \- We want you to be the best version of yourself, that is why we offer unlimited access to learning platforms, a wide range of internal courses, and all the knowledge you need to grow professionally * Health Coverage \- Health and wellness are important, that is why we have you and up to four family members in a premiere health plan. We have a couple of options, so you can choose what is best for you and your family * Visual Benefit \- Seeing your work for us would be a sight for sore eyes. We want your vision to always be at 100% which is why we offer up to $200\.000 COP for any visual health expenses * Life Insurance Plan \- We have partnered with MetLife to offer a full\-coverage Ife insurance plan. So, your family is covered, even if you are gone. * Medical Leave Coverage \- We are one of the few companies that cover 100% of your medical leave, for up to 90 days. Your health is the most important thing to us * Professional Growth Opportunities \- We have designed a highly competitive and complete development process, where you will have all the tools to get where you have always wanted to be, personally and professionally * Stock Option Purchase Plan \- As an EPAMer you can be more than just an employee, you will also have the opportunity to purchase stock at a reduced price and become a part owner of our organization * Additional Income \- Besides your regular salary, you will also have the chance to earn extra income by referring talent, being a technical interviewer, and many more ways * Community Benefit \- You will be part of a worldwide community of over 50,000 employees, where you can learn, challenge yourself, stand out, and share your knowledge and experience with multicultural teams! *Please note that even though you are applying for this position, you may be offered other projects to join within EPAM.* EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi\-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting\-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.


