Callosum

Inference Performance & Deployment - Member of Technical Staff

🇬🇧 London, Vereinigtes Königreich Vor Ort IT Veröffentlicht Mai 20, 2026
Arbeitsort Vor Ort
Kategorie IT
IT-Kategorie DevOps / SRE
Sprache English
Veröffentlicht 20. Mai 2026
Zuletzt geprüft 3. Juni 2026
JobGrid-Kontext

Rollenübersicht von JobGrid

Inference Performance & Deployment - Member of Technical Staff at Callosum: London, Vereinigtes Königreich; Vor Ort; IT; DevOps / SRE. JobGrid adds normalized role facts, source context, and a path to the employer application page so candidates can compare the listing before applying.

  • Location and workplace: London, Vereinigtes Königreich, Vor Ort
  • Role classification: IT, DevOps / SRE
  • Source freshness: checked by JobGrid on 2026-06-03.
  • Application path: candidates continue to the employer application page with non-personal referral tags.

About Us

Artificial intelligence scaled on a bet - that bigger models, more identical chips, and more data would keep delivering. As problems grow more complex and the requirements of intelligence more diverse, that bet is breaking down. The next era belongs to heterogeneous intelligence: diverse models on diverse chips, each with distinct strengths, co-evolving into systems of capability unreachable by any single model or accelerator.

Callosum is the Intelligent Systems company. We built the infrastructure to make that possible. Our co-evolution engine optimises simultaneously across workflows, agents, and silicon. We launched in early 2026 showing orders of magnitude improvements in performance and a shift in the cost-performance frontier that no single chip or model provider can provide.

We believe intelligence comes from the system, not the model.

We are scientists and engineers solving what others consider impossible. If you thrive on hard problems, and are passionate and energised by the scale of the challenge, we'd love to hear from you.

About the Role

Callosum believes that orders of magnitude improvements in AI systems will come through application-aware orchestration across heterogeneous hardware. We are building that vision: infrastructure that treats the full landscape of compute as a unified, co-evolving system, evolved beyond GPUs.

This role owns the bridge between Callosum's internal engineering and the real world. You design the tooling and methodologies that ground our technology in real-world performance and behaviour, sitting at the integration point of every engineering function. You will be the first to run our heterogeneous infrastructure in production-equivalent conditions, systematically characterising performance, identifying bottlenecks, and driving decisions on production-readiness. Your work ensures that every layer of the stack is guided by empirical evidence rather than assumption.

What You’ll Build

  • Run experiments self-hosting models on cloud instances or on-prem across providers and hardware configurations, systematically characterising performance envelopes

  • Develop and maintain deployment patterns that are reproducible, measurable, and optimised for latency, throughput, and cost

  • Work at the orchestration and routing software that sits above the inference engine - to improve caching, request scheduling, batching, and resource allocation

  • Act as the integration point for the other roles: consume new accelerator support, engine features, and infrastructure upgrades – to provide high-quality feedback on bottlenecks, essential capabilities, and guide the stack optimisations

  • Build and maintain benchmarking harnesses, regression suites, and performance dashboards that give the team a shared view of system health and progress

What You Bring

  • Experience deploying and benchmarking large model inference in production or production-equivalent environments

  • Familiarity with multi-node GPU deployments and associated networking/communication stacks

  • Strong end-to-end performance characterisation skills: able to isolate whether a bottleneck is in the network, the runtime, the memory subsystem, or the model itself

  • Familiarity with serving frameworks like Dynamo, Triton Inference Server, or similar orchestration layers

  • Clear communication skills - able to translate performance data into actionable, prioritised feedback for the teams building the underlying systems

  • A demonstrable disciplined and systematic approach to deployment: reproducibility, measurement methodology, controlled comparisons, etc

What We Offer

  • Competitive Salary, determined by skills and experience

  • Equity & Ownership

  • Private healthcare

  • We offer Visa sponsorship and relocation benefits to hire the best in the world

  • We work in person at our London office. You'll have the tools, space and setup to do your best work, and if you have specific needs, just tell us

We're committed to building an inclusive workplace where everyone feels welcome, and believe in equal opportunities for all.