200M+ annual transactions. Ensure full regulatory compliance by embedding data residency, business continuity, and disaster recovery requirements into all infrastructure designs and operations. Drive operational excellence through proactive monitoring, capacityplanning, incident response, and service reliability engineering (SRE). Automate and optimize infrastructure using Infrastructure-as-Code (IaC), cost governance, and performance tuning to achieve efficiency and scalability. … Kubernetes, Docker, and orchestration of cloud-native workloads. Solid experience with monitoring, logging, and observability tools (Prometheus, Grafana, Stackdriver) to ensure reliability and performance. Experience managing cloud cost governance, capacityplanning, and performance tuning for efficiency and scalability. Strong grounding in ISO 27001, SAMA Cybersecurity Framework, and other regulatory requirements. Proven ability in incident management, root cause analysis More ❯
rare opportunity to join a fast-scaling technology organisation at the forefront of AI and high-performance computing infrastructure. With a bold vision to deploy 3GW of data centre capacity by 2030, the company is building the physical backbone for a new generation of compute-intensive workloads. The business is an NVIDIA Cloud Partner and is a key organisation … schedules, and incident response protocols Systems & Asset Management: Ensure the integrity and accuracy of data within the Data Centre Infrastructure Management (DCIM) system, including asset tracking, environmental monitoring, and capacityplanning Vendor & Stakeholder Management: Manage vendor and contractor relationships, including maintenance providers, equipment suppliers, and service partners What you'll need to succeed 3+ years' experience in managing … with facility systems Demonstrable ability to build, mentor, and lead operational teams in high-uptime, SLA-driven, 24x7 environments Proficiency in DCIM tools and asset management systems for monitoring, capacityplanning, operational reporting and conducting asset audits. Hands-on experience with critical infrastructure systems and routine maintenance procedures Expert knowledge of industry standards and best practices (ISO More ❯
London, South East, England, United Kingdom Hybrid/Remote Options
Hays Specialist Recruitment Limited
rare opportunity to join a fast-scaling technology organisation at the forefront of AI and high-performance computing infrastructure. With a bold vision to deploy 3GW of data centre capacity by 2030, the company is building the physical backbone for a new generation of compute-intensive workloads. The business is an NVIDIA Cloud Partner and is a key organisation … schedules, and incident response protocols Systems & Asset Management: Ensure the integrity and accuracy of data within the Data Centre Infrastructure Management (DCIM) system, including asset tracking, environmental monitoring, and capacityplanning Vendor & Stakeholder Management: Manage vendor and contractor relationships, including maintenance providers, equipment suppliers, and service partners What you'll need to succeed 3+ years' experience in managing … with facility systems Demonstrable ability to build, mentor, and lead operational teams in high-uptime, SLA-driven, 24x7 environments Proficiency in DCIM tools and asset management systems for monitoring, capacityplanning, operational reporting and conducting asset audits. Hands-on experience with critical infrastructure systems and routine maintenance procedures Expert knowledge of industry standards and best practices (ISO More ❯
City of London, London, England, United Kingdom Hybrid/Remote Options
Lorien
improvement. Main responsibilities: Develop and maintain a global operations performance framework aligned to strategic objectives. Define and implement KPIs and reporting tools for operational health and accountability. Lead forecasting, capacityplanning, and scenario modelling to optimise resources. Oversee performance governance, SLA reviews, and supplier scorecards. Drive root-cause analysis and improvement programmes across regions. Manage quality assurance frameworks … annual leave + bank holidays + option to buy and sell 10% employee pension contribution Life Assurance, plus much more Requirements: Proven experience in operations performance, analytics, or workforce planning in a global insurance environment. Experience designing and operating performance management frameworks, management reporting, forecasting and capacityplanning at scale Strong analytical skills with expertise in BI More ❯
london (paddington), south east england, united kingdom
Boldyn Networks
responsibilities include optimising operational technology infrastructure for outstanding Quality-of-Service (QoS), ensuring service level excellence, and driving cost-efficient, scalable network design. The RAN Architect also oversees system capacityplanning and network expansion strategies, ensuring Boldyn's infrastructure remains robust, future-ready, and aligned with industry standards. What You'll Be Doing Lead the architecture and evolution … select appropriate technologies (e.g., private leased lines, Ethernet services, IPsec VPNs), and ensure seamless integration with the overall network architecture. Specify and validate transmission topologies, including redundancy, failover, and capacityplanning for high-availability service delivery. Oversee the implementation of traffic segregation and security controls across transmission links, in line with JOTS/NHIB standards and Boldyn's … functional teams to resolve issues and enhance service quality. What You'll Bring Knowledge of 3GPP technologies (primarily 4G and 5G) RAN Architecture, emphasis on Interfaces and Protocols RAN Capacity Management, Feature and Parameter trial Performance and Configuration Management Defining and implementing processes for Performance reporting Lab testing experience for First-Site Integration (Not mandatory, but highly beneficial) Troubleshooting More ❯
new features, assessing their impact on existing systems and recommending adoption strategies • Identify and lead opportunities for process improvement and technological innovation within field service operations • Contribute to strategic planning and goal setting for the department or function • Demonstrate strong analytical, logical thinking, and problem-solving skills • Manage project workstreams, ensuring successful planning, budgeting, execution, and completion Your … a deep understanding of Oracle B2C Service Cloud architecture, data model, configuration, and customization capabilities Should have proven expertise in Oracle B2C Service Cloud solutions to optimize scheduling, routing, capacityplanning, work order management, and mobile workforce management Should have rich experience in configuring RIGHT NOW application components: knowledge of Customizing RIGHT NOW using the plug-in framework … meet specific business needs, including capturing time and expenses; knowledge of all Oracle Cloud security policies, standards, and procedures Should be excellent planner when it comes to perform release planning and other delivery planning Should have excellent analytical, problem-solving, and troubleshooting skills with a keen eye for detail Should be ready to be responsible for Coaching and More ❯
operational tasks ("toil") to increase system efficiency, reduce manual effort, and free up engineering time. Drive continuous improvement in system reliability, performance, and recoverability (Disaster Recovery/Business Continuity Planning). Collaborate closely with development teams (DevOps) to improve the entire software lifecycle, focusing on service stability and release engineering. Establish and refine Service Level Indicators (SLIs), Service Level … Objectives (SLOs), and Service Level Agreements (SLAs) for critical services. Conduct capacityplanning and performance testing to ensure the AWS environment can handle current and future load requirements. Who we're looking for At Reapit, we prioritise hiring individuals who share our values and possess the right attitudes and behaviours for success. Whilst some of the listed requirements More ❯
scalable network solutions Partnering with internal teams and external vendors to resolve complex network issues Leading infrastructure projects involving data centres, telephony, and cloud connectivity Performing proactive monitoring, maintenance, capacityplanning, and performance tuning Driving automation and process improvement initiatives What You’ll Need Minimum 5 years’ experience supporting enterprise-level data and voice networks Solid experience in More ❯
training needs of the IT team and where possible deliver internal training to upskill staff. Lead development of a roadmap for potential future growth of the IT function, including capacityplanning and team structure. Budget & Asset Management Oversee the IT budgets, ensuring cost-effective procurement and contract management. Lead vendor relationships, ensuring clear service level agreements and accountable More ❯
Birmingham, Leeds, Liverpool, London (Canary Wharf), United Kingdom Hybrid/Remote Options
UK Health Security Agency
and consumption. Develop dashboards, reports, and Key Performance Indicators (KPIs) for senior leadership on cloud financial performance. Identify optimisation opportunities and steer engineering initiatives to drive efficiencies. Support agile planning and scoping of infrastructure financial controls and enhancements. Lead communication of FinOps practices across teams, building technical capability and confidence. This list is not exhaustive. About us We pride … cloud cost governance approach for AWS and Azure environments, ensuring accurate distribution of service costs across the organisation. Collaborate with business and technical stakeholders to support consumption forecasts, budget planning, and financial reporting. Lead on implementation of tooling and practices that allow for near real-time cloud cost monitoring, anomaly detection, and forecasting. Drive the use of automation and … and operational workflows. Work with service owners to deliver meaningful insights on consumption efficiency and architectural improvement opportunities. Support procurement, licence, and subscription decisions with cost-benefit analysis and capacityplanning data. Represent Platform Engineering in transformation and governance forums, feeding into operational readiness and service improvement initiatives. Lead development of training, guidance, and communities of practice related More ❯
with SLOs and diagnostic tools Enforce security through access controls, secrets management, vulnerability scanning, and policy-as-code Manage environment consistency and optimise cloud costs through performance monitoring and capacityplanning Create reusable automation tools, templates, and documentation for developer self-service Support incident response and collaborate with agile teams on release coordination Qualifications and Requirements: Hands-on More ❯
London, South East, England, United Kingdom Hybrid/Remote Options
Computappoint
with SLOs and diagnostic tools Enforce security through access controls, secrets management, vulnerability scanning, and policy-as-code Manage environment consistency and optimise cloud costs through performance monitoring and capacityplanning Create reusable automation tools, templates, and documentation for developer self-service Support incident response and collaborate with agile teams on release coordination Qualifications and Requirements: Hands-on More ❯
and categories. Understanding incident correlation in the Microsoft 365 Defender portal. MDI - Microsoft Defender for Identity Connecting Defender for Identity to Active Directory. Running the sizing tool for resource capacity planning. Running the auditing tool to assess the compatibility of your domain controllers with the sensor. Deploying the sensor to capture and parse network traffic and Windows events directly More ❯
practices regarding security and scalability Understand the current application infrastructure, suggesting changes to it Define and document best practices and strategies regarding application deployment and infrastructure maintenance Define service capacityplanning strategies Implement the application's CICD pipeline using the AWS CICD stack Project tasks deliverables and management Deliver on project progress and ensure adherence to client expectation More ❯
Work closely with application, ML, and research teams to understand their needs and translate them into reusable infra building blocks. Provide guidance on “how to run this in production” — capacityplanning, failure modes, and operational readiness reviews. You Might Be a Great Fit If You Have strong experience 5+ years building and operating production infrastructure on a major More ❯
in Palo Alto, Charlotte, Belfast, Berlin, and Lisbon. What You Will Do: Manage, monitor, and optimize ClickHouse clusters in production environments — including schema design, query tuning, replication setup, and capacity planning. Operate and maintain Kafka, OpenSearch, and other distributed systems, ensuring high performance, scalability, and reliability. Deploy, configure, and manage containerized applications and stateful workloads on Kubernetes, following best More ❯
repeatable deployments. Automating with PowerShell, Python, or Bash to drive efficiency. Supporting Kubernetes and AKS environments in production. Leading incident response, postmortems, and continuous improvement processes. Driving cost optimisation, capacityplanning, and load testing. Championing best practices in cloud security and resilience. Key Skills & Experience Required: Proven Site Reliability Engineering background. Strong Terraform skills with live environment deployment. More ❯
repeatable deployments. Automating with PowerShell, Python, or Bash to drive efficiency. Supporting Kubernetes and AKS environments in production. Leading incident response, postmortems, and continuous improvement processes. Driving cost optimisation, capacityplanning, and load testing. Championing best practices in cloud security and resilience. Key Skills & Experience Required: Proven Site Reliability Engineering background. Strong Terraform skills with live environment deployment. More ❯
the performance of F5 services and application delivery solutions. · Provide technical support and troubleshooting for F5-related issues. · Implement security policies and best practices for F5 appliances. · Participate in capacityplanning and scalability assessments. · Collaborate with network and system engineers to ensure seamless integration of F5 services. Qualifications: · Must possess active SC clearance. · Proven experience with F5 Technologies More ❯
initiatives for large-scale network systems. Collaborate on design and implementation of next-generation networking solutions, including hybrid and multi-cloud integrations. Ensure network reliability and scalability through proactive planning and implementation. 4. Network Monitoring and Management: Administer over 6K+ network devices across 550+ circuits, maintaining high availability and performance. Implement automation and monitoring tools to reduce manual intervention … and optimize network efficiency. Perform capacityplanning, monitoring, and regular audits of network infrastructure. 5. Incident Management and Troubleshooting: Proactively monitor network infrastructure and resolve incidents to ensure business continuity. Lead troubleshooting efforts for critical incidents across LAN/WAN, wireless, and hybrid environments. Collaborate with cross-functional teams for issue escalation, root-cause analysis, and resolution. What More ❯
and insights to Marketing and Leadership Translate operational data into actions that improve efficiency, velocity and utilisation Monitor channel ROI and resource allocation by region, segment and campaign Own capacityplanning and resourcing visibility for the marketing function Commercial Interface and Cross-Functional Alignment Act as the operational liaison between Marketing, Sales, and RevOps Ensure consistent handover of More ❯
workforce information to managers and staff including provision of self-service access to managers and staff on ESR. This will involve the provision of high quality workforce information and planning function which will involve working with multidisciplinary stakeholders and also the triangulation of a range of data sources. The post holder will have overall responsibility for managing the Workforce … Information function and the production of timely high quality workforce and planning information to support strategic HR decision making and effective performance management across the Trust. The post holder will be the Registration Authority Manager (RAM) and Privacy Officer for the Trust, leading on all initiatives in regard to the Registration Authority (RA) and User Identity Management (UIM) - Hybrid … payroll and system audits. Registration Authority Manage the Trusts Registration Authority (RA), responsible for ensuring the adherence to policy and governance for the efficient day to day operation and capacityplanning of the RA services. This will also include operational duties associated with RA. Ensure compliance of RA processes and Trust policy in line with any change in More ❯
West London, London, United Kingdom Hybrid/Remote Options
McGregor Boyall Associates Limited
TA workstreams. This role is focused on embedding consistent processes, strengthening reporting, and improving recruitment enablement across divisions. You will bring together multiple TA projects, ensuring standardised approaches to planning, tracking and delivery. With hands-on experience of Talent Acquisition systems, you will act as the bridge between technical teams and end users, solidifying SuccessFactors reporting and ensuring high … workstreams, track KPIs and produce insightful reporting Partner with Senior Recruitment Business Partners to support pan-organisation collaboration Drive continuous improvement, embedding tools, frameworks and education into BAU Coordinate capacityplanning, resource forecasting and cross-portfolio alignment Identify risks, dependencies and opportunities for efficiency Ensure clear, outcome-based deliverables and well-communicated plans Essential Skills & Experience Strong PMO More ❯
TA workstreams. This role is focused on embedding consistent processes, strengthening reporting, and improving recruitment enablement across divisions. You will bring together multiple TA projects, ensuring standardised approaches to planning, tracking and delivery. With hands-on experience of Talent Acquisition systems, you will act as the bridge between technical teams and end users, solidifying SuccessFactors reporting and ensuring high … workstreams, track KPIs and produce insightful reporting Partner with Senior Recruitment Business Partners to support pan-organisation collaboration Drive continuous improvement, embedding tools, frameworks and education into BAU Coordinate capacityplanning, resource forecasting and cross-portfolio alignment Identify risks, dependencies and opportunities for efficiency Ensure clear, outcome-based deliverables and well-communicated plans Essential Skills & Experience Strong PMO More ❯
authentication, and role-based permissions in Splunk to ensure security and compliance. Integrate Splunk with third-party systems, APIs, and security tools for enriched event correlation. Conduct license management, capacityplanning, and health monitoring of Splunk infrastructure. Collaborate with cross-functional teams to define KPIs and deliver actionable insights from log data. More ❯