Senior Site Reliability Engineer

🇵🇭 Віддалено, Філіппіни Віддалено Повна зайнятість Опубліковано Чер 3, 2026

Подати заявку

Локація Віддалено, Філіппіни

Формат роботи Віддалено

Тип зайнятості Повна зайнятість

Мова English

Опубліковано 03 червня 2026 р.

Остання перевірка 04 червня 2026 р.

Контекст JobGrid

Огляд ролі від JobGrid

Senior Site Reliability Engineer at Omilia: Віддалено, Філіппіни; Повна зайнятість. JobGrid adds normalized role facts, source context, and a path to the employer application page so candidates can compare the listing before applying.

Location and workplace: Віддалено, Філіппіни
Role classification: Повна зайнятість
Source freshness: checked by JobGrid on 2026-06-04.
Application path: candidates continue to the employer application page with non-personal referral tags.

We are looking for a Senior Site Reliability Engineer with Cloud platform experience. This individual will be part of a team responsible for operating and maintaining production clusters and developing our observability solutions; they will collaborate with team members to develop automation strategies, monitoring & alerting, and ensuring overall platform reliability. Your goal will be to become an integral part of the team, making every challenge of the platform – your own challenge, and solving them accordingly.

Responsibilities

Ensure platform reliability and availability across production and pre-production environments through proactive monitoring, alerting, and automation.
First response for incidents, contribute to problem management and root cause analysis.
Supporting the development team's effort towards reliability, creating a solid reliability culture within the development lifecycle.
Develop troubleshooting documentation for production support resources.
Collaborate with Engineering teams to develop optimised and productive runbooks, operational documentation and automation of operational tasks.
Collaborate with development and cloud engineering teams to embed reliability and performance into the software delivery lifecycle.
Design, implement, and evolve observability solutions (metrics, logs, traces, dashboards) using tools such as Prometheus, Grafana, and ELK.
Participate in on-call rotations and continuously improve alert quality and response processes.
Champion a culture of reliability, performance, and continuous improvement across teams.