GenAI Systems Administrator is needed for long term contract

We are seeking an experienced Consultant with strong GenAI system administration and GPU operations expertise to support a large-scale AI and GPU data center environment. This role is designed as a long-term position, embedding you directly into daily operations where you will act as a trusted technical advisor and operational specialist.

The engagement will begin with an onsite phase of approximately four to six months in Norway, after which the role can transition to remote working.

As a GenAI Systems Administrator Resident, you will focus on the operational stability, performance, and observability of GPU-based AI platforms. You will work under the client's direction, aligning closely with evolving operational and business needs, while ensuring systems are healthy, performant, and ready to support production workloads.

You will bring hands-on experience with GPU hardware from NVIDIA and AMD, along with a strong background operating in large data center environments. Confidence working with Dell OpenManage, Red Hat Enterprise Linux, Ubuntu, NVIDIA Bright Cluster, Omnia, Grafana, and Prometheus is essential. Your expertise must be practical and demonstrable, particularly in diagnosing and resolving issues in complex GPU-based systems.

In day-to-day operations, you will monitor, review, and manage infrastructure, respond to user and operational requests, and analyse system and application logs. You will produce regular operational and GPU utilisation reports, helping teams understand system health, performance trends, and potential risks before they impact workloads.

A key part of the role involves strong operational troubleshooting. You will be comfortable diagnosing systems that are not behaving optimally, with a deep understanding of GPU failure modes and how to detect early warning signs. You will help surface the right metrics, alerts, and conditions through monitoring platforms such as Grafana and Prometheus, ensuring system health is visible and actionable.

You will support change and problem management activities, evaluate proposed changes, and provide clear recommendations. Post-implementation, you will contribute to planning and continuous improvement while ensuring knowledge is shared effectively across teams. Issue tracking and escalation are also central to the role-you will work closely with engineering teams, raise and track issues, support investigations, and represent the client's operational perspective throughout the resolution process.

Collaboration is fundamental. As the onsite go-to technical resource, you will work closely with Designated Support Engineers and Onsite Field Service Engineers to resolve hardware and operational issues quickly. For major incidents or upgrades, you will coordinate with remote experts to minimise downtime and operational impact.

Fluent English is required.

Job Details

Company
DWI Consulting Ltd
Location
United Kingdom
Hybrid / Remote Options
Employment Type
Contract
Salary
GBP Annual
Posted