Omilia

Senior Site Reliability Engineer

🇵🇭 Remote, Philippines Remote Temps plein Publié Jui 3, 2026
Lieu Remote, Philippines
Mode de travail Remote
Contrat Temps plein
Langue English
Publié 3 juin 2026
Dernière vérification 4 juin 2026
Contexte JobGrid

Résumé du poste par JobGrid

Senior Site Reliability Engineer at Omilia: Remote, Philippines; Temps plein. JobGrid adds normalized role facts, source context, and a path to the employer application page so candidates can compare the listing before applying.

  • Location and workplace: Remote, Philippines
  • Role classification: Temps plein
  • Source freshness: checked by JobGrid on 2026-06-04.
  • Application path: candidates continue to the employer application page with non-personal referral tags.

We are looking for a Senior Site Reliability Engineer with Cloud platform experience. This individual will be part of a team responsible for operating and maintaining production clusters and developing our observability solutions; they will collaborate with team members to develop automation strategies, monitoring & alerting, and ensuring overall platform reliability. Your goal will be to become an integral part of the team, making every challenge of the platform – your own challenge, and solving them accordingly.

Responsibilities

  • Ensure platform reliability and availability across production and pre-production environments through proactive monitoring, alerting, and automation.
  • First response for incidents, contribute to problem management and root cause analysis.
  • Supporting the development team's effort towards reliability, creating a solid reliability culture within the development lifecycle.
  • Develop troubleshooting documentation for production support resources.
  • Collaborate with Engineering teams to develop optimised and productive runbooks, operational documentation and automation of operational tasks.
  • Collaborate with development and cloud engineering teams to embed reliability and performance into the software delivery lifecycle.
  • Design, implement, and evolve observability solutions (metrics, logs, traces, dashboards) using tools such as Prometheus, Grafana, and ELK.
  • Participate in on-call rotations and continuously improve alert quality and response processes.
  • Champion a culture of reliability, performance, and continuous improvement across teams.