Litmus

Lead DevOps Engineer

🇨🇦 Toronto, CA Presencial TI Publicado Abr 9, 2026
LocalizaçãoToronto, CA
ModalidadePresencial
CategoriaTI
Categoria ITDevOps / SRE
Publicado9 de Abril de 2026
Última verificação7 de Maio de 2026
Who is Litmus

Litmus is building the data foundation that powers industrial AI.

AI doesn’t work without real-world, contextualized data - Litmus makes that data usable. As AI adoption accelerates, most industrial environments still can’t access or use their operational data. We solve that gap.

We’re a growth-stage software company helping manufacturers access, structure, and use real-time data from machines, systems, and sensors at the edge. Our platform sits at the intersection of edge computing, AI, and industrial operations, enabling some of the world’s largest companies to run operations in real time, reduce downtime, and optimize production.

Backed by leading investors and trusted by global manufacturers and partners like Google, Microsoft, Dell, Oracle, and Mitsubishi, Litmus is powering the shift toward software-defined manufacturing.

Why join Litmus

Build the infrastructure that makes industrial AI possible

AI is moving beyond the cloud and into the physical world. At Litmus, you’ll build the infrastructure that enables real-time data to power AI and machine learning systems in production environments.

Work on problems where software meets the real world

Most AI systems fail without access to real-world data. You’ll build the layer that makes them viable in production. We solve challenges at the intersection of distributed systems, real-time data, and industrial constraints — where reliability, scale, and performance are non-negotiable.

Have real impact, fast

You’ll work on systems used by real customers in production, with direct impact on product and company trajectory. As a scaling company, we move quickly. You’ll have ownership, visibility, and the ability to shape both product and company as we scale.

Join a high-performance team

We’re building a team that holds a high bar and pushes each other to improve. You’ll work alongside experienced operators, engineers, and leaders who have done this before and are building again at scale. We hire people who take ownership, move quickly, and care about outcomes. No passengers.

Our culture

At Litmus, the team is collaborative, curious, and low ego. People are scrappy, take ownership, and look for ways to make an impact. We value empathy just as much as execution, whether that’s in how we build, how we communicate, or how we support each other.

We’re a growing company, so things move quickly and not everything is perfectly defined. If you enjoy figuring things out, working closely with others, and making steady progress, you’ll do well here.

About the Role

Litmus is building the industrial IoT platform of record, and our DevOps function is the engine that lets engineering move fast with confidence. This is a senior technical leadership role — reporting directly to the Head of Technology — for someone who is ready to own the DevOps function end-to-end across the entire company, and to lead its transformation into an AI-enabled engineering discipline.

You will inherit a capable, distributed team and a meaningful technical foundation: self-hosted GitLab for CI/CD, multi-cloud infrastructure across AWS and GCP, Kubernetes (EKS) workloads, and an on-premises VMware estate. Your mandate is to level up this foundation, drive down delivery friction for the broader engineering organization, and make strong technical decisions without needing direction for day-to-day operations.

If you thrive at the intersection of platform engineering, cloud infrastructure, and security automation — and you want to be the person who sets the standard — this role is for you.

What You’ll Own

Technical Leadership & Team

  • Lead and mentor a distributed DevOps team spanning North America and India, including an infrastructure security-focused sub-team.

  • Serve as the primary technical decision-maker for the DevOps function — architecture, tooling choices, prioritization, and delivery standards.

  • Partner with Engineering, QA, and Product leadership to reduce delivery friction and improve DORA metrics (lead time, deployment frequency, MTTR, change fail rate).

  • Represent the DevOps function at the leadership level, including communicating roadmap, risks, and platform health to the Head of Technology and broader Technology leadership.

CI/CD Platform (GitLab)

  • Own the self-hosted GitLab platform — upgrades, runner fleet management (VMware-hosted and cloud), and platform health.

  • Drive maturity of the CI/CD Catalog and shared template library (ci-common/gitlab-templates), ensuring teams can self-serve without bespoke pipeline configuration.

  • Evolve pipeline capabilities: container image scanning, IaC static analysis, SAST, SBOM/CVE generation, and MR-triggered security scans.

  • Establish and enforce merge request standards, branch protection policies, and CODEOWNERS governance across the GitLab organization.

Kubernetes & Cloud Infrastructure

  • Own EKS day-2 operations: cluster upgrades, node group management, networking (private API endpoints, Cloudflare tunnel integration), and reliability posture.

  • Manage multi-cloud infrastructure across AWS (primary) and GCP, including resource lifecycle, cloud cost optimization, and account governance.

  • Lead the rationalization of legacy infrastructure (on-prem Nexus, Concourse CI) and drive the migration to cloud-native equivalents where appropriate.

  • Maintain and improve the Terraform IaC estate, including drift detection, module governance, and GitLab CI-driven plan/apply workflows.

Security & Identity

  • Drive the rollout and stabilization of SSO federation across vCenter/VMware, AWS IAM Identity Center, and Azure AD groups.

  • Own the security tooling stack: Qualys vulnerability scanning, Defender alert triage, container scanning pipelines, and SBOM/CVE reporting for product releases.

  • Establish and enforce secrets management standards using 1Password across pipelines and infrastructure automation.

  • Ensure data security in transit and at rest as automation and self-service capabilities expand.

Observability & Platform Engineering

  • Build and own the internal developer platform vision — reducing cognitive load on engineers, QA, and program managers through self-service tooling and automation.

  • Lead the observability stack: Grafana (Helm-deployed on EKS), alerting pipelines, and infrastructure/application performance monitoring.

  • Drive a metrics-first culture for the DevOps function, using DORA metrics and custom platform health indicators to guide roadmap decisions.

  • Evaluate and recommend tooling investments that improve developer experience, pipeline performance, and release confidence.

AI-Enabled DevOps Transformation

  • Own and drive the AI transformation of the DevOps function — identifying where AI tooling can meaningfully reduce toil, accelerate delivery, and improve reliability across the engineering organization.

  • Integrate AI-assisted tooling into CI/CD pipelines: automated code review augmentation, AI-generated pipeline diagnostics, intelligent test selection, and anomaly detection in build and deployment workflows.

  • Embed AI capabilities into the observability and incident response stack — using LLM-assisted root cause analysis, alert summarization, and runbook generation to reduce mean time to resolution.

  • Champion AI coding tool adoption across the engineering team — evaluating, piloting, and governing tools (LLM-powered IDEs, AI pair programming, code generation) to maximize productivity while maintaining security and IP standards.

  • Apply AI-driven approaches to cloud cost optimization — using intelligent anomaly detection and spend forecasting to inform FinOps decisions across AWS and GCP.

  • Build a point of view on AI governance for the DevOps function — defining appropriate data handling boundaries, prompt security practices, and acceptable use policies as LLM tooling becomes embedded in engineering workflows.

What You’ll Bring

Required Experience & Skills

  • 5+ years of progressive DevOps/platform engineering experience, with at least 2 years in a technical lead or staff-level role.

  • Deep, hands-on experience with GitLab CI/CD:

    • Self-hosted GitLab administration (upgrades, runners, platform governance)

    • Building and maintaining shared CI/CD templates and catalogs

    • Pipeline security integrations (SAST, container scanning, IaC analysis)

  • Production Kubernetes experience (preferably EKS):

    • Cluster upgrades, node management, networking, and RBAC

    • Day-2 operations and reliability engineering

    • GitLab-driven deployment workflows

  • Multi-cloud infrastructure proficiency across AWS and at least one of GCP/Azure:

    • AWS IAM, Organizations, SSO/IAM Identity Center

    • VPC networking, EKS, ECR, and cloud cost optimization

  • Infrastructure as Code with Terraform:

    • Module design, remote state, drift detection

    • CI/CD-driven plan/apply pipelines

  • Identity and access management:

    • Azure AD / Microsoft Entra ID — SSO federation and group-based access

    • Experience federating VMware vCenter, AWS, or similar platforms with AD/LDAP

  • Security tooling experience: vulnerability scanning (Qualys or equivalent), secrets management (1Password, Vault, or equivalent), SBOM/CVE pipeline integration.

  • Fluency in at least one scripting language (Bash, Python, or similar) for automation and tooling.

  • Strong written and verbal communication — able to write clear design documents, drive technical alignment, and represent the team in cross-functional and leadership conversations.

  • Demonstrated experience using AI tooling in an engineering context — whether in pipelines, developer tooling, observability, or infrastructure automation — and a clear point of view on where it creates genuine leverage vs. hype.

Nice-to-Have Experience

  • Familiarity with Yocto/BitBake build systems and embedded Linux release pipelines.

  • Experience with Concourse CI or other pipeline orchestration systems in a migration context.

  • Cloudflare Zero Trust / WARP / Tunnel architecture.

  • Experience with DataHub, Grafana Loki, or similar observability/data catalog tooling.

  • Exposure to industrial IoT platforms, edge computing, or embedded Linux product delivery.

  • Experience managing GitLab at scale across 50+ repositories and multiple engineering teams.

  • Hands-on experience building AI-augmented DevOps workflows: LLM-powered runbook generation, AI-assisted incident triage, or natural language interfaces to infrastructure tooling.

  • Familiarity with MCP (Model Context Protocol) server integration or agentic AI tooling applied to developer workflows.

Compensation

CA$145,000 – CA$185,000 base salary, commensurate with experience.

Total package includes benefits, equity participation, and professional development allowance.

Litmus is committed to building an inclusive team. We encourage applications from candidates of all backgrounds and will provide accommodation throughout the recruitment process upon request.

Antes de sair

Deixe o seu e-mail para acompanhar esta vaga e receber alertas relevantes. Também pode continuar sem o partilhar.