Material

Inference Performance Engineer

🇺🇸 New York, США На місці IT Старший спеціаліст Опубліковано Тра 13, 2026

Подати заявку

Локація New York, США

Формат роботи На місці

Рівень досвіду Старший спеціаліст

Категорія IT

IT-категорія Інше IT

Мова English

Опубліковано 13 травня 2026 р.

Остання перевірка 10 червня 2026 р.

Контекст JobGrid

Огляд ролі від JobGrid

Inference Performance Engineer at Material: New York, США; На місці; Старший спеціаліст; IT; Інше IT. JobGrid adds normalized role facts, source context, and a path to the employer application page so candidates can compare the listing before applying.

Location and workplace: New York, США, На місці
Role classification: IT, Інше IT, Старший спеціаліст
Source freshness: checked by JobGrid on 2026-06-10.
Application path: candidates continue to the employer application page with non-personal referral tags.

About the role

Serving frontier models at scale requires solving novel systems problems at every layer of the stack. As an Inference Performance Engineer, you'll own the runtime that turns accelerators into a production serving system, optimizing throughput, latency, and cost across thousands of nodes. You'll work alongside hardware and compiler teams operating at the frontier of AI silicon design.

What you'll do

Build and improve the inference runtime
Design scheduling, continuous batching, KV cache, and prefill/decode disaggregation
Implement low-precision kernels and speculative decoding
Drive throughput, latency, and cost per token
Collaborate with hardware teams on kernels, operators, and graph optimizations
Own the OpenAI-compatible API surface and serving protocol
Build benchmarking, profiling, and regression infrastructure

What you'll need

BS in CS, EE, or related field, or equivalent experience
Software engineering experience: Rust, Go, Python, or C++
Understanding of concurrency, memory, and tail latency
Understanding of modern inference: transformers, attention, KV cache, batching, speculative decoding, quantization
Experience with model serving frameworks: vLLM, TGI, SGLang, TensorRT-LLM, llama.cpp, or custom runtimes
GPU or ASIC programming experience: CUDA, ROCm, Triton, or vendor-native toolchains
Experience with low-precision inference (FP8, FP4, INT4)
Profiling and benchmarking experience: Nsight, perf, custom harnesses

What we offer

Top-tier compensation structured to recognize and retain the best talent
Meaningful equity
Comprehensive medical, dental, vision, life, and disability insurance
Parental leave for all new parents, including adoptive and surrogate journeys
Flexible PTO
Paid Holidays
Relocation support

Equal Employment Opportunity

We're an Equal Opportunity Employer and do not discriminate on the basis of any protected status under applicable law.