playonsports

Senior Site Reliability Engineer

Remote, Remote Remote Posted May 11, 2026
LocationRemote, Remote
WorkplaceRemote
LanguageEnglish
PostedMay 11, 2026
Last verifiedMay 13, 2026

JobGrid listing details

JobGrid.eu keeps the employer description in its original language and adds clear listing facts, freshness, and source context so candidates can evaluate the role before applying.

Key details
1 location, Remote
Current openings
13 active jobs
Original language
English
Source and freshness
Collected from public career pages and reviewed through JobGrid.eu source availability checks. Last verified: May 13, 2026.
Apply path
JobGrid.eu sends candidates to the original application page and adds non-personal referral parameters.
Playon is looking for an experienced Senior Site Reliability Engineer to help us strengthen the reliability, performance, and scalability of our systems. This role sits at the intersection of software engineering and operations — focused on building the tools, automation, and visibility that enable our teams to deliver resilient software at scale.   You’ll work closely with application engineers, DevOps, and QA teams to evolve our infrastructure, CI/CD pipelines, observability frameworks, and reliability practices. This is a hands-on engineering role with a strong emphasis on automation, performance analysis, and continuous improvement.   The Outcomes You’ll Deliver:   In the first few months, You'll focus on building a clear understanding of our systems and establishing the foundation for stronger observability across our platforms. As you settle in, your scope will grow to include broader reliability and performance initiatives.   • Assess and improve visibility: Work with engineering teams to review our current dashboards, metrics, and logs, identify the biggest gaps, and make targeted improvements that help us better understand system health. • Tighten monitoring and alerting: Refine alerts and dashboards for the most critical services so we can catch issues earlier and respond faster. • Build observability into delivery: Add instrumentation and telemetry into existing build and deploy processes to make reliability checks part of our normal release workflow. • Clarify what "reliable" means: Help define initial SLIs and SLOs for a few core user flows, aligning the team on what good performance and availability look like. • Streamline incident response: Partner with the Event Commander/on-call rotation to improve how we communicate, coordinate, and follow up during incidents. • Reduce manual effort: Automate routine checks and monitoring tasks to free up engineers for more impactful work. Over time, you'll take on a larger role shaping how we measure, monitor, and improve reliability across all services — setting standards, mentoring others, and helping engineering teams make data-driven decisions about performance and stability.

Before you leave

Leave your email to track this opening and receive relevant alerts. You can also continue without sharing it.