The Investigo Group

Senior Site Reliability Engineer (SRE)

🇬🇧 Remoto, Reino Unido Remoto TI Tempo inteiro Sénior Publicado Jun 2, 2026

Candidatar-se

Localização Remoto, Reino Unido

Modalidade Remoto

Contrato Tempo inteiro

Senioridade Sénior

Categoria TI

Categoria IT DevOps / SRE

Idioma English

Publicado 2 de Junho de 2026

Última verificação 5 de Junho de 2026

Contexto da JobGrid

Resumo da vaga pela JobGrid

Senior Site Reliability Engineer (SRE) at The Investigo Group: Remoto, Reino Unido; Tempo inteiro; Sénior; TI; DevOps / SRE. JobGrid adds normalized role facts, source context, and a path to the employer application page so candidates can compare the listing before applying.

Location and workplace: Remoto, Reino Unido
Role classification: TI, DevOps / SRE, Tempo inteiro, Sénior
Source freshness: checked by JobGrid on 2026-06-05.
Application path: candidates continue to the employer application page with non-personal referral tags.

Role: Senior Site Reliability Engineer (SRE) – Kubernetes / OKD

Department: Cloud

Location: Remote -UK (possible paid occasional travel to TIG Secure site or customer locations as required)

Job Type: Full-time, Permanent

Salary: Competitive + benefits + package

Security Clearance Requirements

Please note that holding a current Security Clearance is not essential at the time of application, but eligibility is required.

This role requires the successful candidate to be eligible for Security Check (SC) clearance. To meet this requirement, applicants must:

Have the right to work in the UK
Have lived in the UK continuously for the past 5 years
Not have spent more than 6 months outside the UK in total during that period
Be willing to undergo security vetting as part of the onboarding process

About Us

Come and be part of The Investigo Group (TIG), a dynamic coalition of technology businesses specialising in Platform, Software, Data, AI and secure digital solutions.

Our group is made up of several specialist brands, including:

Voixtel — secure communications and voice platforms for regulated and critical environments.
IIS — secure internet access for public and private sector organisations.
Vestigo Consulting — specialist consultancy, training and sector-specific expertise.
Collaboraite — our Data and AI capability, focused on secure, user-centred data solutions.

Across TIG, we build secure, user-focused products and services for organisations operating in complex, regulated and mission-critical environments. We combine deep technical knowledge with advanced data, platform and engineering capability to solve real-world customer challenges.

We are also committed to creating an inclusive environment where people from all backgrounds are welcomed, supported and empowered to do their best work.

About You

You are an experienced SRE, Platform Engineer, Cloud Engineer or Kubernetes Engineer with strong hands-on experience operating production Kubernetes environments.

You do not just deploy into Kubernetes — you understand what it takes to run it properly in production. You are comfortable working across Linux, Kubernetes, infrastructure as code, GitOps, CI/CD, observability, identity and secure platform operations.

You treat infrastructure as a product. You care about measurable reliability, clean automation, useful observability, well-maintained runbooks and platforms that other engineers can genuinely rely on.

You are calm and methodical during incidents, able to lead blameless post-mortems, and focused on turning operational issues into long-term improvements.

You are likely to have worked in a regulated, secure, government, defence, financial services, telecoms, managed services, consultancy or cloud-native environment. Most importantly, you have operated Kubernetes at depth and understand the realities of production ownership.

This role is suited to a senior individual contributor who can mentor others, influence engineering practice and act as a technical authority without needing formal line management responsibility.

Soft Skills

Calm and structured under incident pressure, with the discipline to debug methodically rather than thrash.
Strong written communication, runbooks, post-mortems, design notes, and customer-facing reports need to be clear and durable.
Collaborative working style, particularly across networking, security, and application engineering boundaries.
Comfortable holding a blameless line in post-mortems while still driving systemic fixes.
Able to influence engineering practice through evidence and example rather than positional authority.
Mentoring instinct, willing to lift the capability of those around them.

About the Role

We are looking for a Senior Site Reliability Engineer — Kubernetes / OKD to help own, operate, harden and mature our production Kubernetes estate.

TIG operates production OKD / Kubernetes platforms that support services for UK government and regulated customers. These platforms are foundational to our service delivery and to the products other engineering teams build on top of them. Reliability, security and operational maturity are therefore critical to how we deliver for customers.

This is a hands-on senior engineering role, not a ticket-handling position. You will work across the full stack, from bare metal and virtualisation through to Kubernetes control plane operations, ingress, observability, identity, CI/CD and the developer platform layered on top.

You will play a key role in improving the operational maturity of our platform estate, supporting a key migration, strengthening GitOps and CI/CD practices, and ensuring our platforms remain reliable, secure, measurable and supportable.

You will work closely with platform, application, AI, networking, security, QA and architecture teams to build reliable foundations that enable engineering teams to deliver safely and at pace.

What the role offers you

Direct ownership of platforms that ship into government-regulated environments.
Modern self-hosted toolchain, including OKD / Kubernetes and an on-premises AI platform running on NVIDIA DGX hardware.
A small, senior engineering team with minimal bureaucracy and CTO-led technical decisions.
Genuine remote-first working, with travel only where it adds value.

About the Team

Platform Engineering is responsible for the foundational platforms on which TIG's services run, including our OKD / Kubernetes estate, our internal developer platform, and the supporting cloud, networking, and security infrastructure. The team works across UK government and regulated commercial engagements and operates within accredited environments.

Key Responsibilities

Operate, harden and extend production OpenShift / OKD / Kubernetes clusters across on-premises and hybrid environments.
Supporting migrations, helping modernise the underlying compute and infrastructure layer.
Own CI/CD processes across the full lifecycle of platform and application components.
Own and mature GitOps deployment practices, particularly using tools such as Argo CD.
Support cloud-native application delivery using tools such as Helm and Kustomize.
Maintain and improve core platform services including Keycloak, ingress, observability, certificate management, service mesh and container registry capabilities.
Build and operate observability across logs, metrics, traces, alerting, SLOs and error budgets.
Improve platform hardening in line with secure and regulated environment requirements.
Automate repeatable operational tasks using tools such as Ansible, Terraform, Helm, Kustomize, Go, Python or similar.
Lead incident response activity, support blameless post-mortems and drive systemic fixes.
Partner with networking and security teams on platform integration, segmentation, load balancing and accreditation evidence.
Create and maintain clear technical documentation, runbooks, design notes and operational guidance.
Mentor engineers and act as a senior technical authority across cloud and Kubernetes operations.
Participate in an on-call rota, with appropriate compensation.

Success in This Role Looks Like

A more reliable, secure and measurable production OKD / Kubernetes estate.
Clear SLOs, error budgets and reliability trend data that engineering teams actively use.
A mature GitOps approach covering platform and application components, including rollback and drift detection.
Improved CI/CD practices that help teams move at pace while bringing security, QA and compliance earlier into the lifecycle.
Hardened, well-documented and supportable core platform services.
Observability that reduces noise and supports better engineering decisions.
Stronger incident response, clearer runbooks and post-mortems that lead to real operational improvements.
Recognition as a technical authority for Kubernetes, cloud and platform operations across TIG.