dnb

Senior Site Reliability (R-19383)

🇮🇪 Dublin, Irlandia Na miejscu IT Senior Opublikowano Cze 10, 2026
Lokalizacja Dublin, Irlandia
Tryb pracy Na miejscu
Poziom doświadczenia Senior
Kategoria IT
Kategoria IT DevOps / SRE
Język English
Opublikowano 10 czerwca 2026
Ostatnio sprawdzono 11 czerwca 2026
Kontekst JobGrid

Podsumowanie roli od JobGrid

Senior Site Reliability (R-19383) at dnb: Dublin, Irlandia; Na miejscu; Senior; IT; DevOps / SRE. JobGrid adds normalized role facts, source context, and a path to the employer application page so candidates can compare the listing before applying.

  • Location and workplace: Dublin, Irlandia, Na miejscu
  • Role classification: IT, DevOps / SRE, Senior
  • Source freshness: checked by JobGrid on 2026-06-11.
  • Application path: candidates continue to the employer application page with non-personal referral tags.

The Senior Site Reliability Engineer (SRE) is responsible for ensuring the reliability, availability, performance, and operability of production systems across our platforms, by applying software engineering practices to operations, with a focus on automation, observability, and incident response.

Responsibilities:

  • Own and improve the reliability, availability, and performance of production services in Google Cloud (GCP).
  • Participate in incident management, including detection, triage, mitigation, escalation, and recovery.
  • Use and improve incident workflows and tooling (e.g., ServiceNow) to ensure clear ownership and timely communication.
  • Design, implement, and operate observability solutions including monitoring, logging, tracing, synthetics, and dashboards (e.g., Splunk Observability, OpenTelemetry).
  • Reduce operational toil through automation and engineering-led solutions, proactively introducing and driving SRE best practices.
  • Support on-call rotations across multiple time zones, contributing to a sustainable 24/7 support model.
  • Define, monitor, and report SLIs, SLOs, and error budgets for critical services.
  • Drive and be accountable for best-in-class service availability through SRE principles, automation, and proactive reliability engineering.

Essential skills and/or Certifications:

  • Bachelor’s degree in Computer Science, Information Technology or related field
  • Strong experience with cloud-native concepts and technologies, with a strong preference for Google Cloud Platform (GCP) and Kubernetes (GKE).
  • Proven experience with Site Reliability Engineering and production incident management, ideally using platforms such as ServiceNow.
  • Experience with monitoring and observability tools, including metrics, logs, traces, and synthetics (e.g., Splunk Observability, OpenTelemetry).
  • Exposure to reliability testing, resilience engineering, or cost optimisation initiatives.
  • Excellent analytical and problem-solving skills, with the ability to diagnose complex production issues quickly.
  • Software development or automation experience using Python, shell scripts, or similar languages.
  • Hands-on experience operating production cloud infrastructure at scale.
  • Experience managing multi-region, high-availability production systems with a focus on scalability, resilience, and minimising service disruption during failures.
  • Proficiency in Microsoft Office Suites Skills
  • Show an ownership mindset in everything you do; be a problem solver, be curious and be inspired to take action, be proactive, seek ways to collaborate and connect with people and teams in support of driving success.
  • Continuous growth mindset, keep learning through social experiences and relationships with stakeholders, experts, colleagues and mentors as well as widen and broaden your competencies through structural courses and programs.
  • Where applicable, fluency in English and languages relevant to the working market.