Monitor, troubleshoot, and optimize systems, networks, and application performance across hybrid environments. Collaborate with security, development, and operations teams to enforce DevSecOps best practic-es. Participate in incident response, rootcauseanalysis, and implement long-term fixes. Maintain and document configurations, processes, and network topologies. Required Qualifications Extensive hands-on experience with F5 load balancers expertise. Strong More ❯
new and updated system changes Developing, executing, and improving documentation for installation, configuration, hardening, and operations and maintenance tasks Ensuring compliance with IT infrastructure standards, policies, and procedures Conducting rootcauseanalysis and resolving system and application faults and errors Ensuring operating systems and applications comply with Department of Defense (DoD) guidelines, including DISA Security Technical Implementation More ❯
operations and maintenance tasks Document activities, status, and issues worked on Provide input to and follow Configuration Management processes Ensure adherence to IT infrastructure standards, policies, and procedures Perform rootcauseanalysis and resolve system and application faults and errors Maintain working knowledge of Microsoft Active Directory, Group Policy Objects (GPOs), DHCP, DNS, and PowerShell General understanding More ❯
operational performance, and security compliance. Facilitate effective communication between IT teams and business units. Problem Solving and Incident Management: Manage and resolve high-priority incidents and critical issues. Conduct rootcauseanalysis and implement corrective actions to prevent recurrence. Develop and maintain incident response plans and procedures. Requirements: Proven experience as a Digital Operations Manager, IT Manager More ❯
Sunderland, Tyne and Wear, England, United Kingdom
Nigel Wright Group
of public cloud infrastructure. Monitoring performance and implementing optimisations to enhance user experience. Ensuring system availability and reliability through proactive monitoring, backups, and disaster recovery planning. Incident management and rootcauseanalysis with preventive measures. Implementation of security best practices and compliance monitoring. Design and execution of disaster recovery and business continuity plans. Automation and orchestration using More ❯
and test network engineering/administration activities. Create and maintain Standard Operating Procedures (SOPs) and technical documentation. Provide follow-up reports (technical findings, feedback, and resolution steps taken) for RootCauseAnalysis and process improvement initiatives. Required Qualifications Top Secret Clearance Minimum of a Bachelor's degree in Science, with 12-15 years' experience or Master's More ❯
activities with the NMCI Operations Manager, NOC Lead, Release Management team and other key stakeholders. •Tier III escalation support and vendor engagement supporting Incident Management activities. •Active participation in RootCauseAnalysis for Problem Management activities. You'll Bring These Qualifications: •Requires B.S. Degree and 8-12 years of prior relevant experience or Masters with More ❯
Shefford, Bedfordshire, South East, United Kingdom
Intercity Technology Limited
Monday - Sunday - 4 on 4 off - 7pm - 7am. Key Responsibilities as a Cloud Operations Engineer: Maintain and troubleshoot Azure and hybrid cloud environments. Perform proactive monitoring, incident response, and rootcauseanalysis of mission-critical systems. Configure, optimise, and secure servers, virtual machines, networking, and storage solutions. Create and maintain scripts (e.g., PowerShell) to automate operational tasks. More ❯
Stockport, Greater Manchester, North West, United Kingdom
Nexperia
Systems Support team (CIM), Operational Technology Engineers, Data Engineers, and Web Developer Monitoring and reporting on system performance, availability, and incident response metrics Providing leadership in incident management and rootcauseanalysis for system-related issues, while also ensuring effective change control procedures for all changes introduced to the factory (ITIL) Managing and leading a team of More ❯
assigned trouble tickets, including incident and deployment tickets, providing timely updates in the ticketing system that is informative of the work performed. Also provide correction of discrepancies identified and rootcauseanalysis to prevent future discrepancies. Distribute peripheral IT equipment supplies (i.e., network printer toner, etc.) Support the maintenance of local area network (LAN) and wide area More ❯
Doing Providing advanced (Tier 4) support for complex technical issues escalated by Tier 2 and 3 teams Troubleshooting production and test system issues via logs, traces, telemetry, and SQL analysis Scripting solutions and automating tasks using PowerShell and similar tools Simulating customer issues in local/test environments for detailed rootcauseanalysis Collaborating with the … times through tooling and knowledge-sharing Supporting the UK customer base with occasional flexibility to liaise with US counterparts What We're Looking For Strong SQL skills for data analysis and report creation Cloud support experience in a technical role Working knowledge of PowerShell scripting (report automation, system tasks) Understanding of C# (not necessarily coding, but enough to troubleshoot More ❯
Doing Providing advanced (Tier 4) support for complex technical issues escalated by Tier 2 and 3 teams Troubleshooting production and test system issues via logs, traces, telemetry, and SQL analysis Scripting solutions and automating tasks using PowerShell and similar tools Simulating customer issues in local/test environments for detailed rootcauseanalysis Collaborating with the … base with occasional flexibility to liaise with US counterparts What We're Looking For 5+ years of cloud support experience in a technical role Strong SQL skills for data analysis and report creation Working knowledge of PowerShell scripting (report automation, system tasks) Understanding of C# (not necessarily coding, but enough to troubleshoot and advise) Hands-on experience with Microsoft More ❯
Helm charts, and pod definition. • Kubernetes Administration: Manage and configure Kubernetes clusters for high availability, scalability, and security. • Debugging and Defect Correction: Troubleshoot and resolve software defects with effective rootcauseanalysis and debugging techniques. • GPU Configuration and Support: Configure and optimize GPU resources using CUDA or other technologies for compute-intensive workloads. • Automated Testing and Deployment … Strong analytical and problem-solving mindset • Excellent verbal and written communication skills • Adaptability and a drive for continuous learning and improvement Bonus If You Have: • Understanding of RF signal analysis or satellite communications systems Engineer smarter, Build bolder. Apply now and learn more about our extensive benefits and customizable compensation packages More ❯
problem-solving and troubleshooting abilities. Attention to detail and commitment to data accuracy. Experience with cloud-based data migration. Strong analytical and problem-solving skills, with ability to conduct rootcauseanalysis on system, process or production problems and ability to provide viable solutions. Experience working in an Agile environment with Scrum Master/Product owner and More ❯
CDN, DDoS protection, DNS, Zero Trust, and Anycast networks. Implement Infrastructure as Code (Terraform) and maintain CI/CD pipelines, automation, and monitoring workflows. Support BAU operations, incident response, rootcauseanalysis, and on-call support. Collaborate with cross-functional teams and clients to translate business requirements into technical solutions. Mentor junior engineers and drive continuous improvement More ❯
IPSEC tunnels, and certificate-based authentication Contribute to AD design and secure environment management Mentor junior staff and act as a key escalation point Participate in incident response and rootcauseanalysis Required Skills & Experience: 5+ years in a Network Engineer or Infrastructure Engineer role Strong knowledge of TCP/IP, VLAN, VXLAN, EVPN, VPC, MLAG Deep More ❯
Birmingham, West Midlands, Marston Green, West Midlands (County), United Kingdom
Applause IT Recruitment Ltd
IPSEC tunnels, and certificate-based authentication Contribute to AD design and secure environment management Mentor junior staff and act as a key escalation point Participate in incident response and rootcauseanalysis Required Skills & Experience: 5+ years in a Network Engineer or Infrastructure Engineer role Strong knowledge of TCP/IP, VLAN, VXLAN, EVPN, VPC, MLAG Deep More ❯
Cheltenham, Gloucestershire, England, United Kingdom
Sanderson
documentation Support compliance with internal processes and security policies What You'll Bring Strong organisational and time management skills Familiarity with ITIL best practices A proactive mindset focused on rootcauseanalysis and continuous improvement Collaborative attitude - eager to share knowledge and learn from others Risk-aware approach to identifying and resolving potential issues MUST BE ELIGIBLE More ❯
to £70,000 (dependent on experience) Working Arrangement: Hybrid (~2 days on-site per week) Office Location: Central London Responsibilities: Problem Management: Support and facilitate Problem Management activities, driving rootcauseanalysis and producing insightful reports to enhance service performance. Service Transition/Service Introduction: Collaborate with Transformation and Technical Teams to ensure robust service transition practices More ❯
Azure, and endpoint infrastructure challenges Collaborate with engineering teams on automation, scripting, and tooling improvements Oversee vendor relationships and guide technology strategy to align with business needs Drive rootcauseanalysis, proactive monitoring, and resilience planning to minimise disruption What You’ll Bring Proven experience in leading or mentoring technical support teams in high-pressure environments Expert More ❯
reports using tools such as SQL, DataBricks, Excel, Power BI, or Tableau. Collaborate with IT, finance, and loan operations teams to resolve data quality issues and streamline processes. Perform rootcauseanalysis on data discrepancies and recommend corrective actions. Assist in audits and regulatory reporting by providing supporting data and documentation. Participate in system conversions, upgrades, and More ❯
Strong understanding of distributed systems, fault tolerant design, and high availability architectures. Knowledge of CI/CD pipelines and infrastructure as code tools (Terraform, Ansible, CloudFormation). Experience in rootcauseanalysis and implementing systemic improvements. Preferred: Significant experience with UX/UI writing or design Knowledge of regulatory standards and compliance (e.g., PCI DSS, HIPAA). More ❯
Reliability Engineering:Define SLIs/SLOs and manage error budgets- use data-driven insights to balance reliability and feature velocity. Lead on-call rotations, incident response, and conduct blameless rootcauseanalysis to drive continuous improvement. Performance & Capacity:Forecast and right-size resource usage across clusters and middleware Profile and tune application performance (CPU, memory, GC, threading More ❯
Cardiff, South Glamorgan, Wales, United Kingdom Hybrid / WFH Options
Southern Communications Ltd
Play a critical role in the governance of the MVC and ASP.NET framework estate, aligning with compliance, security, and change control processes. Take ownership of incidents and issues, ensuring rootcauseanalysis and robust long-term solutions. Maintain a backlog of enhancement requests, prioritising changes in line with stakeholder needs and product direction. Help establish and grow More ❯
on continuous innovation. Service Assurance: Lead the Operations Control Center (OCC), running modern service management processes for monitoring, incident, problem, and change management. Ensure proactive detection, rapid response, and root-causeanalysis to keep mission critical systems always on. Global Support Services: Direct global customer and employee support, desk-side services across bureaus, and on-site technical More ❯