Callosum

Inference Engine Development - Member of Technical Staff

🇬🇧 London, Royaume-Uni Sur site IT Publié Mai 20, 2026
Mode de travail Sur site
Catégorie IT
Catégorie IT Ingénieur Back End
Langue English
Publié 20 mai 2026
Dernière vérification 3 juin 2026
Contexte JobGrid

Résumé du poste par JobGrid

Inference Engine Development - Member of Technical Staff at Callosum: London, Royaume-Uni; Sur site; IT; Ingénieur Back End. This listing is part of JobGrid's Emplois de développeur logiciel depuis des pages carrières. JobGrid adds normalized role facts, source context, and a path to the employer application page so candidates can compare the listing before applying.

  • Location and workplace: London, Royaume-Uni, Sur site
  • Role classification: IT, Ingénieur Back End
  • Source freshness: checked by JobGrid on 2026-06-03.
  • Application path: candidates continue to the employer application page with non-personal referral tags.

About Us

Artificial intelligence scaled on a bet - that bigger models, more identical chips, and more data would keep delivering. As problems grow more complex and the requirements of intelligence more diverse, that bet is breaking down. The next era belongs to heterogeneous intelligence: diverse models on diverse chips, each with distinct strengths, co-evolving into systems of capability unreachable by any single model or accelerator.

Callosum is the Intelligent Systems company. We built the infrastructure to make that possible. Our co-evolution engine optimises simultaneously across workflows, agents, and silicon. We launched in early 2026 showing orders of magnitude improvements in performance and a shift in the cost-performance frontier that no single chip or model provider can provide.

We believe intelligence comes from the system, not the model.

We are scientists and engineers solving what others consider impossible. If you thrive on hard problems, and are passionate and energised by the scale of the challenge, we'd love to hear from you.

About the Role

Callosum believes that orders of magnitude improvements in AI systems will come through application-aware orchestration across heterogeneous hardware. We are building that vision: infrastructure that treats the full landscape of compute as a unified, co-evolving system, evolved beyond GPUs.

Inference engines were designed for single-model inference on homogeneous GPU clusters - this role builds them beyond that. Working directly on systems like vLLM and SGLang, you will adapt and extend them for heterogeneous resources, making them hardware-aware, with deeper optimisation around scheduling, memory, and execution. The execution strategies you design - parallelism, disaggregation, caching - will define what heterogeneous inference looks like at production scale. Your work ensures that the capabilities exposed by the lower layers of the stack translate into real, measurable gains, the new standard for how inference runs on diverse hardware.

What You'll Build

  • Contribute upstream to SGLang and vLLM, and maintain internal forks where our requirements diverge

  • Improve hardware-awareness within inference engines so that scheduling, memory management, and execution adapt to the capabilities of the underlying accelerator

  • Design and implement bespoke parallelism and disaggregation strategies that go beyond default configurations to better exploit heterogeneous hardware

  • Work closely with an Accelerator Systems Software engineer to ensure engine-level abstractions map cleanly onto diverse hardware capabilities

What You Bring

  • Deep familiarity with the internals of SGLang, vLLM, or comparable inference serving frameworks - scheduler design, memory management, and execution pipelines

  • Strong background in high-performance Python and C++/CUDA systems, particularly in the context of ML inference

  • Experience designing or implementing parallelism strategies for large model serving

  • Understanding of disaggregated serving architectures and the tradeoffs involved in separating modules of a workflow

  • Demonstrable record of working effectively in fast-moving open source codebases with evolving APIs and design conventions

What We Offer

  • Competitive Salary, determined by skills and experience

  • Equity & Ownership

  • Private healthcare

  • We offer Visa sponsorship and relocation benefits to hire the best in the world

  • We work in person at our London office. You'll have the tools, space and setup to do your best work, and if you have specific needs, just tell us

We're committed to building an inclusive workplace where everyone feels welcome, and believe in equal opportunities for all.