application support strategies Key Responsibilities: Own Application Support Lifecycle: Ensure end-to-end support for critical business applications, meeting SLAs and availability targets. Incident & Problem Management: Lead resolution and rootcauseanalysis for all Retail application incidents, including major (P1/P2) issues. Escalation & Crisis Leadership: Act as the escalation point for major incidents and provide direction … containerization experience with Azure , Docker , and AKS . Familiarity with modern web technologies, including React , REST APIs , and SOAP architectures. Skilled in managing P1/P2 incidents , business impact analysis, rootcause investigations, and change coordination. Strong grasp of IT service management practices; ITIL v4 certification or equivalent preferred. Proactive Monitoring : Hands-on experience with tools like More ❯
to the overall success of the FX desk's technology platform. * Respond rapidly to production incidents using data-driven decision making to minimise downtime and financial impact while leading rootcauseanalysis and conducting blameless post-mortems.* Enhance application health monitoring by implementing robust observability solutions and automating manual processes to improve system resilience.* Drive cost optimisation More ❯
best practices, cloud strategies, and platform engineering. Team Leadership: Guide and coach, a team of engineers, technical specialists, and architects, encouraging the adoption of innovative technologies and practices. Technical Analysis:Lead technical analysis and estimation efforts for custom-built applications. Best Practices:Drive the adoption of release management and automation best practices. Incident Management:Ensure thorough rootcauseanalysis and prompt remediation during any incidents or outages. Vendor Coordination:Work with external vendors to supplement team capacity and expertise when necessary. YOU'RE GOOD AT You bring solid development and program leadership experience to drive technical governance, innovation, integrations, and cloud strategies using emerging technologies like Gen AI. You thrive in environments that demand More ❯
base articles. Monitor application health using tools and custom dashboards. Support integration and communication between cloud platforms (Azure, Entra ID, Microsoft 365). Contribute to service improvement initiatives, including rootcauseanalysis and automation opportunities. Participate in on-call rotations or after-hours incidents during peak retail periods. Work within established security frameworks and governance. Hybrid working More ❯
with innovative approaches, and proactively identify opportunities for process and system improvements. Keep abreast of emerging technologies and industry trends. Oversee change management and incident response activities , including performing root-causeanalysis investigations and bug fixes as required . Lead and mentor team members by providing coaching, training, performance evaluations, and fostering a culture of accountability, responsibility More ❯
Knowledge Management: Maintain up-to-date technical documentation, including API/interface catalogues, data flow diagrams, environment runbooks, and integration design patterns Incident and Service Request Administration: Assist in rootcauseanalysis for integration-related issues, serving as the primary point of contact for documenting, triaging, and coordinating the resolution of incidents and service requests. Change Coordination … a conduit between the development team and project teams to ensure consistent, transparent, and professional communication Education and Experience: Bachelor's degree in computer science, information-technology, engineering, system analysis or a related study, or equivalent experience A minimum of three years in a technology-related capacity with direct exposure to software development or IT project environments. At least More ❯
IT Service Management (ITSM) processes across all teams, ensuring standardized, efficient, and effective service delivery. EstablishSRE-based operational metrics, includingSLOs, SLIs, and error budgets. Overseeincident response, problem resolution, and rootcauseanalysis with AI-driven remediation. Ensurehigh availability, performance, and security compliancefor all enterprise services. Develop afollow-the-sun operational support model, ensuring24x7 resilience and uptime across More ❯
hands-on role supporting high-availability systems, rapid deployments, and production incident response. Key Responsibilities - Manage and monitor AWS infrastructure for performance and security - Respond to production incidents, perform rootcauseanalysis, and implement fixes - Maintain observability tools (Prometheus, Grafana, Splunk) and write PromQL queries - Improve and operate CI/CD pipelines using GitHub Actions and Kubernetes … Prometheus, Grafana, Splunk, and PromQL - Proficient in scripting (Python, Go, Bash, SQL) - Skilled in GitHub, CI/CD, and Kubernetes operations Desirable: - Experience with Terraform or CloudFormation - Advanced log analysis with Splunk - Strong problem-solving and analytical thinking More ❯
and testing efforts to maintain software quality and performance. Support CI/CD pipelines using Jenkins and contribute to automated testing and deployment. Troubleshoot and resolve production issues, performing rootcauseanalysis and providing timely solutions. Mentor junior engineers and share knowledge across the team to foster a collaborative working environment. Basic Qualifications: Bachelor's degree in More ❯
operational procedures. Mentor team members and contribute to a culture of learning and inclusion. Continuously improving infrastructure reliability and reducing manual work (TOIL). Participating in incident response and rootcause analysis. Why Join Us? Join our team and contribute to a culture of innovation, collaboration, and excellence. If you are ready to advance your career and make More ❯
City of London, London, United Kingdom Hybrid / WFH Options
REC SOLUTIONS LIMITED
with development, networks, ops and product teams on strategic IT initiatives. Assist with planning, management and resource allocation of inter-departmental projects alongside the PM team. Oversee incident management, rootcauseanalysis, and rapid resolution of system outages or performance degradation. Ensure compliance of procedures such as change management, patch management and security and audit processes. Assist … understanding of cybersecurity principles and experience implementing security measures in a regulated environment. Ability to coach, mentor, and upskill staff; develop career paths and ensure team resilience. Experience undertaking rootcauseanalysis including prevention orientated solution reporting. Working experience with deployment tools (e.g. GitLab pipelines) and rollback strategies. Proficiency in managing bare-metal servers, virtualization platforms such More ❯
City of London, London, United Kingdom Hybrid / WFH Options
REC SOLUTIONS LIMITED
with development, networks, ops and product teams on strategic IT initiatives. Assist with planning, management and resource allocation of inter-departmental projects alongside the PM team. Oversee incident management, rootcauseanalysis, and rapid resolution of system outages or performance degradation. Ensure compliance of procedures such as change management, patch management and security and audit processes. Assist … understanding of cybersecurity principles and experience implementing security measures in a regulated environment. Ability to coach, mentor, and upskill staff; develop career paths and ensure team resilience. Experience undertaking rootcauseanalysis including prevention orientated solution reporting. Working experience with deployment tools (e.g. GitLab pipelines) and rollback strategies. Proficiency in managing bare-metal servers, virtualization platforms such More ❯
Preparation to Identification, Containment, Eradication, Recovery, and Lessons Learned - collaborating with a global team of incident responders. You will apply your comprehensive skills in cyber defense, digital forensics, log analysis, and intrusion analysis to address security incidents across our endpoints, network, and cloud infrastructure. In this role, you will be responsible for prevention, detection, response, and remediation activities … process is working smoothly Develop incident response runbooks, playbooks and SOPs with reference to different regulatory requirements - Evaluate the incident response readiness of different layers - people, process, technology Detection & Analysis: - Respond to the cyber security incidents escalated from various channels including the 24/7 SOC team. - Respond to cyber security incidents in compliance with the local authority/… regulatory requirements. - Assess the risk, impact and scope of the identified security threats - Perform deep-dive incident analysis of various data sources by analysing and investigating security related logs against medium-term threats and IOCs Containment, Eradication and Recovery: - Communicate with the stakeholders and provide guidance, recommendations to contain and eradicate the security incident - Participate in rootcauseMore ❯
whose approach is getting it "right" in tight timescales can make a real difference". As this role includes support, you may have problem tickets to resolve including detailed rootcause analysis. A typical day would start with the team's stand-up meeting for the current sprint where you'll discuss your workload and any blockers, or … you may attend a major incident management meeting where as the senior engineer on call have worked on problem rootcause and resolution. Next, you may have development coding which could be a new function, problem fix or project related activity. As this role includes support, you may have problem tickets to resolve requiring detailed knowledge on the … RTGS environment is critical to the UK Payments Systems which requires a methodical approach and flexible to work outside core hours as required. Work well under pressure and problem rootcauseanalysis to fix. Minimum Criteria We're looking for someone who has the following key skills and experience: Experience of building effective working relationships with others More ❯
integration applications. Perform functional, integration, regression, and user acceptance testing. Validate system changes through servicenow Change Requests and ensure updates align with CMDB standards. Log and track defects, perform rootcauseanalysis, and work closely with development teams for resolution. Ensure QA processes align with ITIL framework and banking governance standards. 2Business Analysis: Gather, document, and … business needs into clear specifications, user stories, and process flows. Collaborate with project managers, developers, and QA teams to ensure delivery aligns with regulatory and operational expectations. Support gap analysis, impact assessments, and end-to-end process mapping for SAP-servicenow related changes. Ensure traceability of requirements through testing and implementation. IMPLEMENTATION ARRANGEMENTS The Quality Assurance (QA) Analyst will More ❯
integration applications. Perform functional, integration, regression, and user acceptance testing. Validate system changes through servicenow Change Requests and ensure updates align with CMDB standards. Log and track defects, perform rootcauseanalysis, and work closely with development teams for resolution. Ensure QA processes align with ITIL framework and banking governance standards. 2Business Analysis: Gather, document, and … business needs into clear specifications, user stories, and process flows. Collaborate with project managers, developers, and QA teams to ensure delivery aligns with regulatory and operational expectations. Support gap analysis, impact assessments, and end-to-end process mapping for SAP-servicenow related changes. Ensure traceability of requirements through testing and implementation. IMPLEMENTATION ARRANGEMENTS The Quality Assurance (QA) Analyst will More ❯
from requirements gathering to deployment Lead business and stakeholder teams to effectively translate business requirements into technical solutions keeping in mind best practices and industry standards Perform fit-gap analysis to identify opportunities to automate and make existing processes more efficient. Collaborate with various teams, including third party vendors, Enterprise Applications and Infrastructure teams on various projects and day … projects Build and foster client & peer relationships, partner with other teams to deliver mission critical applications Lead support teams and other team members to troubleshoot critical incidents by conducting rootcauseanalysis and identifying solutions Contribute to impact analysis during various application Release Cycles Own comprehensive technical documentation of integrations and other applications for document versions More ❯
checks to identify process defects Reporting Support the creation of routine reporting packs and dashboards for internal stakeholders, utilising and defining performance metrics - Service Level Agreements (SLAs) etc Conduct Analysis utilising tools such as Excel or PowerBI, to identify trends and opportunities for both system optimisation and improvement in operational performance Continuous Improvement - Operations process optimisation Proactively identify opportunities … generating and maintaining a knowledgeable Problem Solving Critically assess and collaboratively work alongside the function's operations team, managed service vendors and enterprise IT team to identify/support rootcauseanalysis and remediation of issues, incidents and escalation. Bridge the gap by translating business requirements to the Tech team and vice versa Vendor Management Maintain a More ❯
storage, backups, and Linux systems using tools such as Ansible, Terraform, and GitHub. Collaborate with cross-functional teams to align infrastructure delivery with DevOps best practices. Lead incident response, rootcauseanalysis, and ongoing support for critical infrastructure services. Define and implement infrastructure administration standards and procedures. Champion Infrastructure as Code and continuous improvement across the hosting More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
Tate Professional
storage, backups, and Linux systems using tools such as Ansible, Terraform, and GitHub. Collaborate with cross-functional teams to align infrastructure delivery with DevOps best practices. Lead incident response, rootcauseanalysis, and ongoing support for critical infrastructure services. Define and implement infrastructure administration standards and procedures. Champion Infrastructure as Code and continuous improvement across the hosting More ❯
Be proficient in Linux server and system administration (e.g., package management, kernel updates, filesystems, volume management) Have experience managing containerized workloads using Docker or Kubernetes Be an expert in RootCauseAnalysis Have a strong desire to learn new skills and technologies, with proven research capabilities and adaptability Possess at least two years of experience training and More ❯
operational performance, and security compliance. Facilitate effective communication between IT teams and business units. Problem Solving and Incident Management: Manage and resolve high-priority incidents and critical issues. Conduct rootcauseanalysis and implement corrective actions to prevent recurrence. Develop and maintain incident response plans and procedures. Requirements: Proven experience as a Digital Operations Manager, IT Manager More ❯
customer disruption. Act as a key escalation point, ensuring clear, respectful communication with stakeholders throughout the incident lifecycle. Manage the full support process - from issue detection to resolution and rootcauseanalysis - and drive improvements to prevent recurrence. Collaborate with product, engineering, and service teams to deliver seamless support. Communicate clearly and empathetically across all levels of More ❯
and peripheral equipment for executives. Mobile device support and advanced troubleshooting skills (Apple & Android technologies). Proactively identify potential technical issues and implement preventive solutions and advanced troubleshooting and rootcause analysis. Liaising with and delegating tasks to relevant teams for escalation. Supporting the Exec Support Specialist and escalating support issues to the Head of IT where necessary. More ❯
and peripheral equipment for executives. Mobile device support and advanced troubleshooting skills (Apple & Android technologies). Proactively identify potential technical issues and implement preventive solutions and advanced troubleshooting and rootcause analysis. Liaising with and delegating tasks to relevant teams for escalation. Supporting the Exec Support Specialist and escalating support issues to the Head of IT where necessary. More ❯