Head of Support & Service Reliability Engineering

🇨🇦 Surrey, Canada On-site Customer Support & Success Posted May 21, 2026

Apply

Location Surrey, Canada

Workplace On-site

Category Customer Support & Success

Language English

Posted May 21, 2026

Last verified May 30, 2026

JobGrid context

Role summary by JobGrid

Head of Support & Service Reliability Engineering at sycurio: Surrey, Canada; On-site; Customer Support & Success. JobGrid adds normalized role facts, source context, and a path to the employer application page so candidates can compare the listing before applying.

Location and workplace: Surrey, Canada, On-site
Role classification: Customer Support & Success
Source freshness: checked by JobGrid on 2026-05-30.
Application path: candidates continue to the employer application page with non-personal referral tags.

We are seeking a Head of Support & Service Reliability to lead and evolve our global support function into a proactive, platform-integrated reliability capability.

This role provides an exciting and dynamic opportunity for an outcome focused individual; as Sycurio is in a critical inflection point as we transition from a single-tenant architecture to a multi-tenant SaaS platform, requiring a fundamental shift from reactive ticket handling to systemic reliability, observability, and customer experience management at scale.

You will own the end-to-end operational integrity of the platform, ensuring availability, performance, and customer trust, while partnering closely with Engineering, Product, and Customer-facing teams; being a key contributor to our GRR goal of 90%+

Sycurio employs a strategic managed service provider who provides the people, tooling, and day-to-day execution across all support tiers. The Head of Support sets the standards, governs vendor performance, and ensures every aspect of the support experience — from incident response to customer satisfaction — meets enterprise-grade expectations

Key Responsibilities:

Service Reliability & Platform Stability
Own platform availability, performance, and reliability across all tenants
Reduce incident frequency, severity, and blast radius
Establish and drive Service Reliability Engineering (SRE) principles
Ensure scalability and operational readiness of a multi-tenant platform
Incident Management & Response
Implement and lead a structured incident management framework (P1–P4)
Act as executive owner of major incidents (P1/P2)
Drive improvements in:
Mean Time to Detect (MTTD)
Mean Time to Resolve (MTTR)
Ensure clear, consistent internal and external communication during incidents
Observability & Monitoring
Define and implement a comprehensive observability strategy, including:
Technical telemetry (infrastructure, application, APIs)
Business telemetry (transactions, payment success rates, usage)
End-to-end customer journey visibility
Ensure issues are detected proactively, not customer-reported
Partner with Product and Engineering to embed telemetry into the platform
Support Operations (L1–L3)
Lead global support teams ensuring high-quality, SLA-driven case management
Define and enforce support processes, tooling, and performance standards
Improve key metrics:
First response time
Resolution time
Reopen rate
Escalation quality
Platform Operations & Change Management
Oversee operational aspects of the platform, including:
Release management and deployment safety, ensuring all releases are observable, reversible, and low-risk
Change control processes
Environment consistency across staging and production
Own the visibility and continuous improvement of delivery and recovery performance using the DORA metrics, in partnership with Engineering
Issue Management & Root Cause Discipline
Establish rigorous Root Cause Analysis (RCA) standards
Identify and eliminate systemic issues (not just symptom fixes)
Track and reduce recurring incidents
Feed insights into Product and Engineering roadmaps
Customer Experience & Commercial Alignment
Align support with Customer Success and Sales
Ensure coordinated communication during incidents
Protect customer relationships during critical events
Introduce tenant-aware impact assessment (ARR, strategic accounts, regulatory exposure)
Support enterprise-grade expectations for transparency and reliability
Cross-functional Leadership
Act as the bridge between:
Engineering
Product
Customer Delivery / Success
Embed supportability and operational readiness into:
Pre-sales (Stage 4/5 governance)
Product development
Deployment processes
Managed Service Governance
Chair regular operational reviews and quarterly business reviews with the managed service leadership team
Own the managed service scorecard — defining KPIs, reviewing performance data, and driving accountability for misses
Manage contract compliance, SLA adherence, and commercial exposure from managed service underperformance
Lead continuous improvement programs jointly with the managed service provider, including tooling upgrades, process redesigns, and training investments
Maintain an escalation path for systemic or persistent managed service failure, up to and including remediation planning

Key qualifications, skills, experience:

10+ years in Support, Platform Operations, or SRE leadership roles
Proven experience in multi-tenant SaaS and legacy environments
Strong understanding of:
Distributed systems
Incident management at scale
Observability frameworks
Track record of building and scaling high-performing operational teams
Experience in outsourced or hybrid operational models
Experience working cross-functionally with Engineering and Product
Background in payments, security, or compliance-driven environments (e.g., PCI)
Experience with API-first platforms and telephony/payment flows
Familiarity with observability tools (e.g., Grafana, etc.)