Observability Engineer - Prometheus, Grafana - CO

Indeed

Full-time

Onsite

No experience limit

No degree limit

Cra 10 #29, Bogotá, Colombia

Favourites

Some content was automatically translatedView Original

Description

Job Summary: Join us as an Observability Engineer to optimize automated monitoring of production cloud infrastructures, ensuring stability, availability, and performance of large-scale data centers. Key Highlights: 1. Implement and optimize cloud-based monitoring solutions. 2. Design and implement performance indicator dashboards. 3. Ensure the proper operation of production clouds. ### **Summary** Join our Site Reliability Engineering team as an **Observability Engineer**, where we implement and optimize tools enabling efficient, automated monitoring—providing the necessary insights to resolve issues and ensure continuous, reliable operation of our cloud-based products in production environments. You will be challenged to guarantee stability, availability, and performance of production cloud infrastructures by designing and implementing monitoring and performance indicator visualization solutions for platforms—ensuring uninterrupted operation of large-scale data centers that support our critical, always-on applications and infrastructure. **This role is available for remote work from the following locations: Mexico, Chile, Argentina, Colombia, Uruguay, and Peru.** **Responsibilities** --------------------- * Design, implement, and optimize monitoring solutions for cloud infrastructures. * Define, analyze, and implement dashboards to visualize critical performance indicators. * Ensure the proper operation of production clouds based on open-source technologies (e.g., Kubernetes and OpenStack). * Respond to critical platform incidents, escalating to Senior Engineers or the Product Development team as needed. **Technical Requirements** ----------------------- * Education: + Degree in Computer Engineering, Systems Engineering, Computer Science, or related field. * Experience: + Minimum 3 years of relevant experience in managing, optimizing, and monitoring cloud infrastructures—especially with technologies such as Kubernetes and/or OpenStack—and handling incidents and production environments. + Experience designing and implementing cloud infrastructure monitoring solutions, as well as performance management and coordination of critical incidents with the development team. * Specific Knowledge / Technical Requirements: + Intermediate Linux - Basic commands, file manipulation, networking, etc. - Experience with Shell scripting (Bash). - Automation (scripting) with Bash and/or Python. + Git: Basic level - Familiar with the standard workflow of add, commit, push. - Not familiar with more advanced commands such as rebase or cherry\-pick. - Unable to resolve merge conflicts. + Intermediate use and creation of container images with Docker. - Ability to create images using a Dockerfile. - Understanding of the Docker container lifecycle. + Use and configuration of monitoring tools (Prometheus, Grafana, Elasticsearch, Kibana). + Use and configuration of deployment tools such as GitLab, ArgoCD, etc. + Experience monitoring external components such as routers, switches, Kubernetes clusters, VMs. + Use and administration of Kubernetes clusters. * Language: Intermediate English (Writing/Reading) * Desirable * Experience with public cloud (AWS, GCP, Azure) or private cloud (OpenStack) * Experience with agile methodologies (Scrum, Kanban, etc.) * Ability to adapt existing open\-source solutions. * Certifications in Linux, OpenStack, and/or Kubernetes * Integration of open-source projects * Basic Networking knowledge * Required Soft\-skills + Autonomy, discipline, and self-learning ability + Conceptual analytical thinking + Customer orientation + Teamwork capability #### **About Us** At **Whitestack**, we are leaders in Latin America in developing Telco Cloud, Open Networking, and hyper\-scalable digital infrastructure solutions. We work with open-source technologies such as OpenStack, Kubernetes, Open Source MANO, Ceph, Prometheus, ONOS, and many others—and actively collaborate with global organizations including ETSI, the Open Infrastructure Foundation, the Telecom Infra Project, and the Open Compute Project. We drive digital transformation across the region through world-class standards, large-scale operator implementations, and a strong commitment to innovation. Additionally, we are a **Great Place to Work**, where collaboration and personal development are integral parts of our culture. **Why Join Whitestack?** International exposure: Participate in global initiatives and travel to collaborate with teams across different countries. ️ Real work-life balance: We design policies aligned with your lifestyle, empowering you to work autonomously and purposefully. Clear career growth: We offer a robust career path in both leadership and technology. Health first: Private health insurance for you and your family. Unlimited learning: Access to courses, books, learning materials, and certification reimbursement. Languages for the world: Language courses so your growth knows no borders. Technology in your hands: We renew your equipment every 3 years… and it’s yours at the end of the term! Recognition for effort: Performance and project success bonuses. Time for you: Minimum 15 vacation days, a birthday day off, and additional breaks before Independence Day, Christmas, and New Year. Connection and fun: Budget for recreational and team-building activities. Innovation culture: Your ideas matter. We encourage strategic participation from any role. Learn more about our benefits here.

Source: indeed View original post

Valentina Rodríguez

Indeed · HR

Company

Indeed

Valentina Rodríguez

Indeed · HR

Similar jobs

Observability Engineer - Prometheus, Grafana - CO

Description

Company

Similar jobs

F&C Underwriting Assistant (m/f/d*) - Munich RE Bogotá

Construction building

Site Reliability Engineering (SRE)

Nursing Assistant

DATA ENTRY CLERK

Procure to Pay Processor