200M+ annual transactions. Ensure full regulatory compliance by embedding data residency, business continuity, and disaster recovery requirements into all infrastructure designs and operations. Drive operational excellence through proactive monitoring, capacityplanning, incident response, and service reliability engineering (SRE). Automate and optimize infrastructure using Infrastructure-as-Code (IaC), cost governance, and performance tuning to achieve efficiency and scalability. … Kubernetes, Docker, and orchestration of cloud-native workloads. Solid experience with monitoring, logging, and observability tools (Prometheus, Grafana, Stackdriver) to ensure reliability and performance. Experience managing cloud cost governance, capacityplanning, and performance tuning for efficiency and scalability. Strong grounding in ISO 27001, SAMA Cybersecurity Framework, and other regulatory requirements. Proven ability in incident management, root cause analysis More ❯
rare opportunity to join a fast-scaling technology organisation at the forefront of AI and high-performance computing infrastructure. With a bold vision to deploy 3GW of data centre capacity by 2030, the company is building the physical backbone for a new generation of compute-intensive workloads. The business is an NVIDIA Cloud Partner and is a key organisation … schedules, and incident response protocols Systems & Asset Management: Ensure the integrity and accuracy of data within the Data Centre Infrastructure Management (DCIM) system, including asset tracking, environmental monitoring, and capacityplanning Vendor & Stakeholder Management: Manage vendor and contractor relationships, including maintenance providers, equipment suppliers, and service partners What you'll need to succeed 3+ years' experience in managing … with facility systems Demonstrable ability to build, mentor, and lead operational teams in high-uptime, SLA-driven, 24x7 environments Proficiency in DCIM tools and asset management systems for monitoring, capacityplanning, operational reporting and conducting asset audits. Hands-on experience with critical infrastructure systems and routine maintenance procedures Expert knowledge of industry standards and best practices (ISO More ❯
London, South East, England, United Kingdom Hybrid/Remote Options
Hays Specialist Recruitment Limited
rare opportunity to join a fast-scaling technology organisation at the forefront of AI and high-performance computing infrastructure. With a bold vision to deploy 3GW of data centre capacity by 2030, the company is building the physical backbone for a new generation of compute-intensive workloads. The business is an NVIDIA Cloud Partner and is a key organisation … schedules, and incident response protocols Systems & Asset Management: Ensure the integrity and accuracy of data within the Data Centre Infrastructure Management (DCIM) system, including asset tracking, environmental monitoring, and capacityplanning Vendor & Stakeholder Management: Manage vendor and contractor relationships, including maintenance providers, equipment suppliers, and service partners What you'll need to succeed 3+ years' experience in managing … with facility systems Demonstrable ability to build, mentor, and lead operational teams in high-uptime, SLA-driven, 24x7 environments Proficiency in DCIM tools and asset management systems for monitoring, capacityplanning, operational reporting and conducting asset audits. Hands-on experience with critical infrastructure systems and routine maintenance procedures Expert knowledge of industry standards and best practices (ISO More ❯
City of London, London, England, United Kingdom Hybrid/Remote Options
Lorien
improvement. Main responsibilities: Develop and maintain a global operations performance framework aligned to strategic objectives. Define and implement KPIs and reporting tools for operational health and accountability. Lead forecasting, capacityplanning, and scenario modelling to optimise resources. Oversee performance governance, SLA reviews, and supplier scorecards. Drive root-cause analysis and improvement programmes across regions. Manage quality assurance frameworks … annual leave + bank holidays + option to buy and sell 10% employee pension contribution Life Assurance, plus much more Requirements: Proven experience in operations performance, analytics, or workforce planning in a global insurance environment. Experience designing and operating performance management frameworks, management reporting, forecasting and capacityplanning at scale Strong analytical skills with expertise in BI More ❯
london (paddington), south east england, united kingdom
Boldyn Networks
responsibilities include optimising operational technology infrastructure for outstanding Quality-of-Service (QoS), ensuring service level excellence, and driving cost-efficient, scalable network design. The RAN Architect also oversees system capacityplanning and network expansion strategies, ensuring Boldyn's infrastructure remains robust, future-ready, and aligned with industry standards. What You'll Be Doing Lead the architecture and evolution … select appropriate technologies (e.g., private leased lines, Ethernet services, IPsec VPNs), and ensure seamless integration with the overall network architecture. Specify and validate transmission topologies, including redundancy, failover, and capacityplanning for high-availability service delivery. Oversee the implementation of traffic segregation and security controls across transmission links, in line with JOTS/NHIB standards and Boldyn's … functional teams to resolve issues and enhance service quality. What You'll Bring Knowledge of 3GPP technologies (primarily 4G and 5G) RAN Architecture, emphasis on Interfaces and Protocols RAN Capacity Management, Feature and Parameter trial Performance and Configuration Management Defining and implementing processes for Performance reporting Lab testing experience for First-Site Integration (Not mandatory, but highly beneficial) Troubleshooting More ❯
new features, assessing their impact on existing systems and recommending adoption strategies • Identify and lead opportunities for process improvement and technological innovation within field service operations • Contribute to strategic planning and goal setting for the department or function • Demonstrate strong analytical, logical thinking, and problem-solving skills • Manage project workstreams, ensuring successful planning, budgeting, execution, and completion Your … a deep understanding of Oracle B2C Service Cloud architecture, data model, configuration, and customization capabilities Should have proven expertise in Oracle B2C Service Cloud solutions to optimize scheduling, routing, capacityplanning, work order management, and mobile workforce management Should have rich experience in configuring RIGHT NOW application components: knowledge of Customizing RIGHT NOW using the plug-in framework … meet specific business needs, including capturing time and expenses; knowledge of all Oracle Cloud security policies, standards, and procedures Should be excellent planner when it comes to perform release planning and other delivery planning Should have excellent analytical, problem-solving, and troubleshooting skills with a keen eye for detail Should be ready to be responsible for Coaching and More ❯
operational tasks ("toil") to increase system efficiency, reduce manual effort, and free up engineering time. Drive continuous improvement in system reliability, performance, and recoverability (Disaster Recovery/Business Continuity Planning). Collaborate closely with development teams (DevOps) to improve the entire software lifecycle, focusing on service stability and release engineering. Establish and refine Service Level Indicators (SLIs), Service Level … Objectives (SLOs), and Service Level Agreements (SLAs) for critical services. Conduct capacityplanning and performance testing to ensure the AWS environment can handle current and future load requirements. Who we're looking for At Reapit, we prioritise hiring individuals who share our values and possess the right attitudes and behaviours for success. Whilst some of the listed requirements More ❯
scalable network solutions Partnering with internal teams and external vendors to resolve complex network issues Leading infrastructure projects involving data centres, telephony, and cloud connectivity Performing proactive monitoring, maintenance, capacityplanning, and performance tuning Driving automation and process improvement initiatives What You’ll Need Minimum 5 years’ experience supporting enterprise-level data and voice networks Solid experience in More ❯
training needs of the IT team and where possible deliver internal training to upskill staff. Lead development of a roadmap for potential future growth of the IT function, including capacityplanning and team structure. Budget & Asset Management Oversee the IT budgets, ensuring cost-effective procurement and contract management. Lead vendor relationships, ensuring clear service level agreements and accountable More ❯
Birmingham, Leeds, Liverpool, London (Canary Wharf), United Kingdom Hybrid/Remote Options
UK Health Security Agency
and consumption. Develop dashboards, reports, and Key Performance Indicators (KPIs) for senior leadership on cloud financial performance. Identify optimisation opportunities and steer engineering initiatives to drive efficiencies. Support agile planning and scoping of infrastructure financial controls and enhancements. Lead communication of FinOps practices across teams, building technical capability and confidence. This list is not exhaustive. About us We pride … cloud cost governance approach for AWS and Azure environments, ensuring accurate distribution of service costs across the organisation. Collaborate with business and technical stakeholders to support consumption forecasts, budget planning, and financial reporting. Lead on implementation of tooling and practices that allow for near real-time cloud cost monitoring, anomaly detection, and forecasting. Drive the use of automation and … and operational workflows. Work with service owners to deliver meaningful insights on consumption efficiency and architectural improvement opportunities. Support procurement, licence, and subscription decisions with cost-benefit analysis and capacityplanning data. Represent Platform Engineering in transformation and governance forums, feeding into operational readiness and service improvement initiatives. Lead development of training, guidance, and communities of practice related More ❯
with SLOs and diagnostic tools Enforce security through access controls, secrets management, vulnerability scanning, and policy-as-code Manage environment consistency and optimise cloud costs through performance monitoring and capacityplanning Create reusable automation tools, templates, and documentation for developer self-service Support incident response and collaborate with agile teams on release coordination Qualifications and Requirements: Hands-on More ❯
London, South East, England, United Kingdom Hybrid/Remote Options
Computappoint
with SLOs and diagnostic tools Enforce security through access controls, secrets management, vulnerability scanning, and policy-as-code Manage environment consistency and optimise cloud costs through performance monitoring and capacityplanning Create reusable automation tools, templates, and documentation for developer self-service Support incident response and collaborate with agile teams on release coordination Qualifications and Requirements: Hands-on More ❯
practices regarding security and scalability Understand the current application infrastructure, suggesting changes to it Define and document best practices and strategies regarding application deployment and infrastructure maintenance Define service capacityplanning strategies Implement the application's CICD pipeline using the AWS CICD stack Project tasks deliverables and management Deliver on project progress and ensure adherence to client expectation More ❯
Work closely with application, ML, and research teams to understand their needs and translate them into reusable infra building blocks. Provide guidance on “how to run this in production” — capacityplanning, failure modes, and operational readiness reviews. You Might Be a Great Fit If You Have strong experience 5+ years building and operating production infrastructure on a major More ❯
in Palo Alto, Charlotte, Belfast, Berlin, and Lisbon. What You Will Do: Manage, monitor, and optimize ClickHouse clusters in production environments — including schema design, query tuning, replication setup, and capacity planning. Operate and maintain Kafka, OpenSearch, and other distributed systems, ensuring high performance, scalability, and reliability. Deploy, configure, and manage containerized applications and stateful workloads on Kubernetes, following best More ❯
repeatable deployments. Automating with PowerShell, Python, or Bash to drive efficiency. Supporting Kubernetes and AKS environments in production. Leading incident response, postmortems, and continuous improvement processes. Driving cost optimisation, capacityplanning, and load testing. Championing best practices in cloud security and resilience. Key Skills & Experience Required: Proven Site Reliability Engineering background. Strong Terraform skills with live environment deployment. More ❯
repeatable deployments. Automating with PowerShell, Python, or Bash to drive efficiency. Supporting Kubernetes and AKS environments in production. Leading incident response, postmortems, and continuous improvement processes. Driving cost optimisation, capacityplanning, and load testing. Championing best practices in cloud security and resilience. Key Skills & Experience Required: Proven Site Reliability Engineering background. Strong Terraform skills with live environment deployment. More ❯
the performance of F5 services and application delivery solutions. · Provide technical support and troubleshooting for F5-related issues. · Implement security policies and best practices for F5 appliances. · Participate in capacityplanning and scalability assessments. · Collaborate with network and system engineers to ensure seamless integration of F5 services. Qualifications: · Must possess active SC clearance. · Proven experience with F5 Technologies More ❯
initiatives for large-scale network systems. Collaborate on design and implementation of next-generation networking solutions, including hybrid and multi-cloud integrations. Ensure network reliability and scalability through proactive planning and implementation. 4. Network Monitoring and Management: Administer over 6K+ network devices across 550+ circuits, maintaining high availability and performance. Implement automation and monitoring tools to reduce manual intervention … and optimize network efficiency. Perform capacityplanning, monitoring, and regular audits of network infrastructure. 5. Incident Management and Troubleshooting: Proactively monitor network infrastructure and resolve incidents to ensure business continuity. Lead troubleshooting efforts for critical incidents across LAN/WAN, wireless, and hybrid environments. Collaborate with cross-functional teams for issue escalation, root-cause analysis, and resolution. What More ❯
and insights to Marketing and Leadership Translate operational data into actions that improve efficiency, velocity and utilisation Monitor channel ROI and resource allocation by region, segment and campaign Own capacityplanning and resourcing visibility for the marketing function Commercial Interface and Cross-Functional Alignment Act as the operational liaison between Marketing, Sales, and RevOps Ensure consistent handover of More ❯
authentication, and role-based permissions in Splunk to ensure security and compliance. Integrate Splunk with third-party systems, APIs, and security tools for enriched event correlation. Conduct license management, capacityplanning, and health monitoring of Splunk infrastructure. Collaborate with cross-functional teams to define KPIs and deliver actionable insights from log data. More ❯
City of London, London, United Kingdom Hybrid/Remote Options
ARC IT Recruitment Ltd
and robust RCAs/post-mortems. Safer, faster releases (blue/green, canary, feature flags) in partnership with Trading, Quant, and Engineering. Mature observability (logs/metrics/traces), capacityplanning, and performance tuning for low-latency flows. Strong production hygiene and controls aligned to MiFID II/MAR/best-ex. Leadership of the London on-call More ❯
London, South East, England, United Kingdom Hybrid/Remote Options
Robert Half
for complex Amazon Connect implementations Develop sophisticated contact flow solutions for advanced customer requirements including 3rd party integrations and SSO Design and implement advanced Amazon integration Create optimised forecasting, capacityplanning and scheduling solutions Lead conversational design strategy and implementation Experience with Amazon Connect Customer Profiles, Cases and Step by Step Guides Guide customers through migration strategies from More ❯