Where this role is available
Collapsed by default to keep the job description easy to scan.
- New York, United States
- New York City, United States
Role summary by JobGrid
Forward Deployed Data Engineer at Mecka.Ai: New York, United States, New York City, United States; On-site; IT; Data Engineer. JobGrid adds normalized role facts, source context, and a path to the employer application page so candidates can compare the listing before applying.
- Location and workplace: New York, United States, New York City, United States, On-site
- Role classification: IT, Data Engineer
- Source freshness: checked by JobGrid on 2026-06-10.
- Application path: candidates continue to the employer application page with non-personal referral tags.
About Mecka AI
Mecka AI is building the data infrastructure layer for robotics and embodied AI.
We partner with leading AI labs and robotics companies to deliver high-quality, real-world datasets used to train, evaluate, and deploy robotic systems - where model performance is dictated by data quality.
The Role
We are hiring a Forward Deployed Data Engineer to operate on the frontier with customers: take messy, real-world capture data - much of it raw video - and turn it into beautiful, reliable, model-ready datasets, while owning the technical relationship end-to-end.
This is a senior, high-trust role with significant autonomy. You'll combine data engineering, hands-on analysis, and product judgment to deliver datasets customers can train and ship on - and to make our delivery systems more reliable every time you do.
What You'll Work On
Customer Delivery & Technical Ownership
Own the end-to-end delivery of customer datasets: requirements, validation, iteration, final handoff.
Be the technical point of contact: communicate clearly, set expectations, and close loops.
Turn one-off customer needs into durable internal improvements - tooling, pipelines, and standards that make every future delivery faster and safer.
Data Systems & Pipelines
Build, debug, and harden data pipelines across ingestion, transformation, QA, and export.
Work fluently across storage and database paradigms (SQL + NoSQL + object storage) and pick the right tool for the job.
Establish reliable dataset "contracts": schemas, versioning, provenance, and reproducible builds - so every dataset has a clear source of truth.
Dataset Quality & Signal
Define and measure what makes a dataset good for a given task: coverage, diversity, balance, label fidelity, and fitness for the customer's model.
Build quality scorecards and coverage/diversity reports that make dataset health legible to customers and internal teams.
Query and slice large corpora to maximize customer fit - surface exactly the data that matches a target distribution, not just bulk volume.
When the signal a customer needs is missing or weak in the raw video, diagnose it and partner with the perception/ML pipeline teams to extract or improve it upstream.
Who You Are
Required Background
5+ years in data engineering and/or backend engineering (or equivalent impact).
Strong experience with large data systems, pipelines, and analytical workflows.
Strong SQL proficiency and comfort across multiple database/storage paradigms.
Excellent engineering judgment and debugging ability in production systems.
Genuine data taste - you can look at a dataset and reason about whether it's complete, balanced, and trustworthy, not just whether the job ran.
Strong Signals
You've owned high-stakes customer deliveries with autonomy and trust.
You can translate ambiguous requirements into crisp dataset specs and execution plans.
You have strong product instincts and care about polish: "would I trust this dataset?"
You're comfortable working with unstructured, real-world data - especially video.
Nice to Have
Working literacy in video understanding, embeddings, and encoders - enough to reason about what a dataset teaches a model and where signal is missing.
Experience building data-quality, coverage, or diversity tooling.
Background adjacent to ML, computer vision, or robotics data.
Why This Role
Own the customer-facing delivery loop for world-class robotics datasets.
High autonomy, high trust, and direct impact on customer success and revenue.
Work across the full stack of the problem: data, pipelines, analysis, and delivery quality.
Sit at the exact point where raw, messy, real-world data becomes the thing that makes embodied-AI models work.