Sheffield, South Yorkshire, United Kingdom Hybrid / WFH Options
Experis
and GCP , ensuring resilience, cost-efficiency, and data security. Collaborate closely with infrastructure, architecture, and cybersecurity teams to meet internal risk, compliance, and governance requirements. Support live systems, perform rootcauseanalysis, and implement solutions for incidents and performance bottlenecks. Qualifications and experience The ideal candidate for this role will have the below experience and qualifications: Bachelor More ❯
partners. Ensure Code Quality: Uphold best practices in version control, documentation, and peer review to maintain high standards. Troubleshoot and Improve Data Quality: Resolve complex data issues by identifying root causes and implementing lasting improvements. Develop Integrations and Applications: Create and maintain back-end APIs, automate data ingestion, and build front-end tools (Astro, React, or low-code) for … reviews, providing constructive feedback to colleagues and championing improvements in style, performance, and security Investigate and Improve Data Quality Proactively identify data anomalies, inconsistencies, and integrity issues by: Conducting root-causeanalysis on recurring data errors or mismatches Reviewing source data feeds and transformation logic to pinpoint upstream issues Recommend and implement fixes and improvements to existing … as a Technical Escalation Point Act as tier 3 support when the service desk or level-2 teams cannot resolve incidents due to complexity: Triage incoming support tickets, identify root causes, and propose corrective actions Provide on-call availability for critical outages affecting data pipelines, integrations, or production environments Document incident resolutions to streamline future troubleshooting efforts Plan, Specify More ❯
cloud subject matter expert, providing AWS best practice guidance to internal teams and project stakeholders. Investigate and resolve AWS infrastructure-related incidents, ensuring minimal downtime and impact. Participate in rootcauseanalysis and implement preventative measures. Maintain clear, detailed documentation for AWS environments, architecture diagrams, SOPs, and runbooks. Continuously look for opportunities to improve cloud architecture, security More ❯
cloud subject matter expert, providing AWS best practice guidance to internal teams and project stakeholders. Investigate and resolve AWS infrastructure-related incidents, ensuring minimal downtime and impact. Participate in rootcauseanalysis and implement preventative measures. Maintain clear, detailed documentation for AWS environments, architecture diagrams, SOPs, and runbooks. Continuously look for opportunities to improve cloud architecture, security More ❯
IT Service Management (ITSM) processes across all teams, ensuring standardized, efficient, and effective service delivery. EstablishSRE-based operational metrics, includingSLOs, SLIs, and error budgets. Overseeincident response, problem resolution, and rootcauseanalysis with AI-driven remediation. Ensurehigh availability, performance, and security compliancefor all enterprise services. Develop afollow-the-sun operational support model, ensuring24x7 resilience and uptime across More ❯
Description We are seeking a knowledgeable Application Support Analyst to liaise with vendors, business users and product teams to perform installations, identify route cause and deliver fixnhancements. The candidate would ideally also have knowledge in commodity trading and will be delivery focused. Knowledgeable in working with Agile (SCRUM) development and delivery teams is advantageous. The ideal candidate disposes of … ability to develop innovative solutions to technical problems whilst working within the company's governance framework. Incident management skills leading & owning the issues from start to resolution leading to rootcause analysis. Problem management skills in regular checks & follow ups on defects & bug arising from incidents. An adaptable attitude, with the ability to multi-task and respond quickly More ❯
hands-on role supporting high-availability systems, rapid deployments, and production incident response. Key Responsibilities - Manage and monitor AWS infrastructure for performance and security - Respond to production incidents, perform rootcauseanalysis, and implement fixes - Maintain observability tools (Prometheus, Grafana, Splunk) and write PromQL queries - Improve and operate CI/CD pipelines using GitHub Actions and Kubernetes … Prometheus, Grafana, Splunk, and PromQL - Proficient in scripting (Python, Go, Bash, SQL) - Skilled in GitHub, CI/CD, and Kubernetes operations Desirable: - Experience with Terraform or CloudFormation - Advanced log analysis with Splunk - Strong problem-solving and analytical thinking More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Cognitive Group | Part of the Focus Cloud Group
Cleared or Eligible for SC Clearance Your responsibilities: Deploy, configure, and monitor AWS services ensuring high availability, scalability, and security. Respond to and resolve infrastructure and service incidents with rootcauseanalysis and preventive measures. Handle change requests, track recurring issues, and work on long-term fixes to improve system stability. Implement and maintain observability solutions using … configuration and deployment management experience with CI/CD Desirable skills Hands-on experience with Terraform or CloudFormation for infrastructure provisioning and automation. Strong knowledge of Splunk for log analysis and troubleshooting. Strong problem-solving skills and analytical thinking. More ❯
Cleared or Eligible for SC Clearance Your responsibilities: Deploy, configure, and monitor AWS services ensuring high availability, scalability, and security. Respond to and resolve infrastructure and service incidents with rootcauseanalysis and preventive measures. Handle change requests, track recurring issues, and work on long-term fixes to improve system stability. Implement and maintain observability solutions using … configuration and deployment management experience with CI/CD Desirable skills Hands-on experience with Terraform or CloudFormation for infrastructure provisioning and automation. Strong knowledge of Splunk for log analysis and troubleshooting. Strong problem-solving skills and analytical thinking. More ❯
South East London, England, United Kingdom Hybrid / WFH Options
Cognitive Group | Part of the Focus Cloud Group
Cleared or Eligible for SC Clearance Your responsibilities: Deploy, configure, and monitor AWS services ensuring high availability, scalability, and security. Respond to and resolve infrastructure and service incidents with rootcauseanalysis and preventive measures. Handle change requests, track recurring issues, and work on long-term fixes to improve system stability. Implement and maintain observability solutions using … configuration and deployment management experience with CI/CD Desirable skills Hands-on experience with Terraform or CloudFormation for infrastructure provisioning and automation. Strong knowledge of Splunk for log analysis and troubleshooting. Strong problem-solving skills and analytical thinking. More ❯
to-end tests on code commits and pull-requests. • Monitor pipeline health and test results; collaborate with DevOps to optimize build times, parallelize tests, and reduce pipeline flakiness. Result Analysis & RootCause • Analyze test outputs, system logs, and metrics (e.g., via ELK Stack or Prometheus/Grafana) to pinpoint failures and performance regressions. • Lead root-cause … testing activity efficiently. An ISTQB Foundation Certification is a strong asset and shows your commitment to professional testing standards. A key part of this role involves problem investigation and rootcauseanalysis, so strong analytical and communication skills are a must. You'll enjoy working as part of a collaborative team, contributing your insights to improve outcomes More ❯
to define implement and improve business performance SLO's. 2+ years of experience with Production operations including 24x7 on-call support, escalation/paging with OpsGenie, incident management, RCA (RootCauseAnalysis) and retrospective analysis. 2+ or more years in hands on technical roles (such as site reliability engineer, software engineer, DevOps engineer, infrastructure engineer). Experience … management. 24x7 Support: Perform deep dives into systemic and latent reliability issues, incident management, problem management. Participate in all aspects of incident management including awareness, communication, remediation, retrospective/rootcause analysis. Identify and implement process improvements of MTTA (Mean Time to Acknowledge) and MTTR (Mean Time to Resolve). Support operations & engineering teams on Azure. AWS and … can talk about complex software systems and have ideas on how to build quality, performant, and easily supportable software most effectively You exhibit dogged determination to get to the root of problems You care about best-practices and evangelizing them with the team You like to research and propose new techniques and methodologies to improve quality and efficiency of More ❯
to agreed SLAs, maintaining high levels of customer satisfaction. Technical Expertise: Implement selected ITIL best practices to ensure that we develop a customer focussed service oriented. Problem Management: Conduct rootcauseanalysis for recurring issues and implement solutions to prevent future occurrences. Stakeholder Management: Develop strong relationships through effective communication and engagement with internal and external stakeholders … for improvement. Compliance: Ensure all processes and procedures comply with regulatory requirements and company policies. Quality Assurance : Ensure that team deliverables are quality reviewed and work with Testing and Analysis functions to ensure releases are tested and fit for implementation Technical Proficiency : Be able to offer technical input into complex solutions while supported by team technical leads Disaster Recovery More ❯
ITSM) processes including asset, change, incident, request, problem, and project management to meet service levels. Provide on-site IT support and assist in resolving broader technical issues. Contribute to rootcauseanalysis and long-term problem management. Act as a key point of contact between IT and users, promoting standards, improving user satisfaction, and sharing best practices. More ❯
Liverpool, Merseyside, North West, United Kingdom Hybrid / WFH Options
In Technology Group Limited
with IT and development teams to ensure secure system architecture and application development. Maintain and enhance incident response procedures and disaster recovery plans. Investigate and document security breaches, providing rootcauseanalysis and remediation plans. Conduct security awareness training for staff and ensure compliance with internal policies and regulatory requirements (e.g., FCA, GDPR, ISO 27001). Stay More ❯
Bletchley, Buckinghamshire, United Kingdom Hybrid / WFH Options
In Technology Group
with IT and development teams to ensure secure system architecture and application development. Maintain and enhance incident response procedures and disaster recovery plans. Investigate and document security breaches, providing rootcauseanalysis and remediation plans. Conduct security awareness training for staff and ensure compliance with internal policies and regulatory requirements (e.g., FCA, GDPR, ISO 27001). Stay More ❯
Milton Keynes, Buckinghamshire, South East, United Kingdom Hybrid / WFH Options
In Technology Group Limited
with IT and development teams to ensure secure system architecture and application development. Maintain and enhance incident response procedures and disaster recovery plans. Investigate and document security breaches, providing rootcauseanalysis and remediation plans. Conduct security awareness training for staff and ensure compliance with internal policies and regulatory requirements (e.g., FCA, GDPR, ISO 27001). Stay More ❯
to-date with the latest advancements in identity management protocols and best practices. Contribute to the development and documentation of technical specifications and design decisions. Troubleshoot technical issues, conduct rootcauseanalysis, and implement timely resolutions to minimize downtime. Qualifications: Bachelor's or Master's degree in Computer Science, Engineering, or related field. Minimum 5+ years of More ❯
New Milton, Hampshire, United Kingdom Hybrid / WFH Options
Appello
infrastructure and cloud services. Deep understanding of SIP, VoIP, VoLTE, STUN, and firewall bridging. Proficiency in Node.js application support and server diagnostics. Hands-on experience using tools for SIP analysis, such as Wireshark, SIP Traces, or packet analysers. Excellent problem-solving and communication skills. Bachelor's degree in Computer Science, Information Technology, or a related field (or equivalent experience … Azure Solutions Architect, or AWS equivalent. ITIL Foundation certification THE ROLE Key Responsibilities Advanced Technical Support Resolve complex hardware, software, and network issues escalated from lower-tier support. Conduct rootcauseanalysis and implement long-term solutions. Manage high-impact incidents to ensure minimal business disruption. ️ Server & Application Support Troubleshoot server issues across cloud (AWS), on-premise More ❯
Talent application support services, ensuring timely and accurate issue resolution for SuccessFactors and associated Talent systems. Act as escalation point for complex system issues and ensure appropriate follow-up, rootcauseanalysis, and long-term resolution. Maintain high standards of system reliability and data integrity in the live production environment. Define and monitor service level objectives (SLOs … changes go live. Strategic Planning and Roadmap Execution Contribute to the definition and delivery of the Talent technology roadmap in partnership with the Global Senior Manager, Talent Systems. Support analysis and prioritization of system enhancements, configuration changes, and new module adoption. Stay current with SAP SuccessFactors roadmap updates and industry trends. Data Quality and Reporting Support data governance initiatives More ❯
checks to identify process defects Reporting Support the creation of routine reporting packs and dashboards for internal stakeholders, utilising and defining performance metrics - Service Level Agreements (SLAs) etc Conduct Analysis utilising tools such as Excel or PowerBI, to identify trends and opportunities for both system optimisation and improvement in operational performance Continuous Improvement - Operations process optimisation Proactively identify opportunities … generating and maintaining a knowledgeable Problem Solving Critically assess and collaboratively work alongside the function's operations team, managed service vendors and enterprise IT team to identify/support rootcauseanalysis and remediation of issues, incidents and escalation. Bridge the gap by translating business requirements to the Tech team and vice versa Vendor Management Maintain a More ❯
of infrastructure components. 2. Monitoring and Incident Management: - Develop and maintain monitoring solutions to proactively identify performance bottlenecks, system outages, and other potential issues. - Participate in incident response and rootcauseanalysis efforts to drive continuous improvement and prevent future incidents. 3. Reliability and Performance Optimization: - Optimise system performance, reliability, and cost efficiency through continuous monitoring, performance More ❯
AZ-104/AWS SOA-C02 or equivalent experience (1-3 years in cloud-based environments) Good understanding of networking concepts (comprehensive understanding of OSI Layers 2-7) Troubleshooting, rootcauseanalysis and communication skills Understanding of system performance metrics and capacity planning Python or PowerShell scripting skills Previous experience in some of the following concepts & technologies More ❯
and compliance requirements. • Act as the primary point of contact for internal business units (including Operations, Compliance & Transactional Banking), IT and external vendors, regarding service performance and enhancements. • Lead rootcauseanalysis and resolution of major incidents. Drive problem management to reduce recurring issues and improve service stability. • Manage projects involving any future enhancements or regulatory changes More ❯
and compliance requirements. • Act as the primary point of contact for internal business units (including Operations, Compliance & Transactional Banking), IT and external vendors, regarding service performance and enhancements. • Lead rootcauseanalysis and resolution of major incidents. Drive problem management to reduce recurring issues and improve service stability. • Manage projects involving any future enhancements or regulatory changes More ❯