Gramian Consulting Group

AI Evaluation Engineer - Software Engineering Domain

🇳🇬 Remote, Nigeria Remote Contrat Publié Mai 27, 2026
Lieu Remote, Nigeria
Mode de travail Remote
Contrat Contrat
Langue English
Publié 27 mai 2026
Dernière vérification 29 mai 2026
Contexte JobGrid

Résumé du poste par JobGrid

AI Evaluation Engineer - Software Engineering Domain at Gramian Consulting Group: Remote, Nigeria; Contrat. This listing is part of JobGrid's Emplois IA à distance depuis des pages carrières. JobGrid adds normalized role facts, source context, and a path to the employer application page so candidates can compare the listing before applying. This listing is part of JobGrid's Emplois de developpeur logiciel a distance depuis des pages carrieres.

  • Location and workplace: Remote, Nigeria
  • Role classification: Contrat
  • Source freshness: checked by JobGrid on 2026-05-29.
  • Application path: candidates continue to the employer application page with non-personal referral tags.

Gramian Consultancy is a boutique consultancy specializing in IT professional services and engineering talent solutions. With a strong background in software engineering and leadership, we help companies build high-performing teams by matching them with professionals who truly fit their needs.

Role Overview

We are looking for highly analytical engineers and technical domain experts to contribute to advanced AI evaluation and benchmarking projects focused on realistic terminal-based and infrastructure-heavy workflows. In this role, you will design technically challenging tasks that evaluate how AI systems reason through debugging, operational failures, complex workflows, and multi-step problem-solving scenarios.

The ideal candidate has strong experience working with production systems, debugging, automation, or large-scale engineering workflows, and can design realistic technical challenges that simulate real-world engineering environments.

This role is particularly well suited for professionals with backgrounds in backend engineering, infrastructure, DevOps, data systems, MLOps, cybersecurity, or platform engineering.

CONTRACT: Contractor assignment (5 weeks)

COMMITMENT: Full-time (40h/week) or Part-time (20h/week) with minimum 4h PST overlap

LOCATION: Remote — Bangladesh, Brazil, Colombia, Egypt, Ghana, India, Pakistan, Indonesia, Kenya, Nigeria, Turkey, Vietnam

PROCESS: One technical assessment/interview (~45 min)

Responsibilities:

  • Design realistic terminal-based benchmark tasks for AI evaluation systems
  • Create technically deep debugging and investigation scenarios
  • Develop task specifications involving infrastructure, workflows, pipelines, or operational failures
  • Write clear solution approaches and deterministic evaluation criteria
  • Identify realistic edge cases, failure modes, and system constraints
  • Design multi-step reasoning challenges across complex technical environments
  • Contribute expertise across one or more engineering or operational domains
  • Review and refine benchmark quality, difficulty, and validation logic
  • Collaborate with reviewers and researchers on AI evaluation workflows