Job Summary
We are seeking a Senior Infrastructure Engineer to serve as the long-term technical owner of virtualization, server, storage, backup, and network infrastructure supporting critical SCADA and engineering systems.
This is a hands-on senior individual contributor role responsible for operational reliability, high availability, cybersecurity alignment, and disaster recovery readiness across VMware and Hyper-V environments. The ideal candidate has experience supporting production OT/industrial environments and understands the accountability required for systems that directly impact operational uptime.
This role participates in on-call rotation, supports scheduled plant outage patch windows, and serves as a primary escalation point for infrastructure-related incidents.
Key Responsibilities
Infrastructure & Virtualization Ownership
- Own lifecycle management of VMware ESXi/vCenter and Microsoft Hyper-V environments.
- Design, maintain, and optimize HA clusters (VMware HA/DRS and Hyper-V Failover Clustering).
- Perform hypervisor patching, upgrades, and lifecycle management.
- Optimize virtual machine performance and resource allocation.
- Maintain standardized templates and configuration governance.
Backup & Disaster Recovery (Veeam)
- Administer and optimize Veeam Backup & Replication.
- Define and maintain RPO/RTO targets for critical systems.
- Perform periodic restore validation and disaster recovery testing.
- Maintain documented recovery procedures and DR runbooks.
- Manage multi-site backup strategies, including full and incremental backups.
Networking & Security
- Deploy and troubleshoot enterprise networks, including Firewalling, routing, and switching/VLAN architecture.
- Support routing fundamentals and firewall rule governance.
- Maintain secure IT/OT segmentation design.
- Support patch compliance and vulnerability remediation initiatives.
- Participate in firewall reviews and security hardening efforts.
- Familiarity with industrial protocols (Modbus, DNP3, OPC-UA) preferred.
Operational Excellence & Incident Response
- Serve as a senior escalation point during infrastructure incidents.
- Lead root cause analysis (RCA) following outages or service degradation.
- Monitor infrastructure performance, storage health, and system capacity.
- Proactively identify risks and implement mitigation strategies.
- Develop and maintain infrastructure standards, SOPs, and documentation.
Availability & Support Expectations
- Participate in on-call rotation.
- Support scheduled maintenance during plant outage windows.
- Support occasional off-hours deployments as required.
- Ensure minimal operational disruption during maintenance activities.