Omilia

Senior Site Reliability Engineer

🇵🇭 Remoto, Filipinas Remoto Tempo inteiro Publicado Jun 3, 2026
Localização Remoto, Filipinas
Modalidade Remoto
Contrato Tempo inteiro
Idioma English
Publicado 3 de Junho de 2026
Última verificação 4 de Junho de 2026
Contexto da JobGrid

Resumo da vaga pela JobGrid

Senior Site Reliability Engineer at Omilia: Remoto, Filipinas; Tempo inteiro. JobGrid adds normalized role facts, source context, and a path to the employer application page so candidates can compare the listing before applying.

  • Location and workplace: Remoto, Filipinas
  • Role classification: Tempo inteiro
  • Source freshness: checked by JobGrid on 2026-06-04.
  • Application path: candidates continue to the employer application page with non-personal referral tags.

We are looking for a Senior Site Reliability Engineer with Cloud platform experience. This individual will be part of a team responsible for operating and maintaining production clusters and developing our observability solutions; they will collaborate with team members to develop automation strategies, monitoring & alerting, and ensuring overall platform reliability. Your goal will be to become an integral part of the team, making every challenge of the platform – your own challenge, and solving them accordingly.

Responsibilities

  • Ensure platform reliability and availability across production and pre-production environments through proactive monitoring, alerting, and automation.
  • First response for incidents, contribute to problem management and root cause analysis.
  • Supporting the development team's effort towards reliability, creating a solid reliability culture within the development lifecycle.
  • Develop troubleshooting documentation for production support resources.
  • Collaborate with Engineering teams to develop optimised and productive runbooks, operational documentation and automation of operational tasks.
  • Collaborate with development and cloud engineering teams to embed reliability and performance into the software delivery lifecycle.
  • Design, implement, and evolve observability solutions (metrics, logs, traces, dashboards) using tools such as Prometheus, Grafana, and ELK.
  • Participate in on-call rotations and continuously improve alert quality and response processes.
  • Champion a culture of reliability, performance, and continuous improvement across teams.