South East London, England, United Kingdom Hybrid / WFH Options
Cognitive Group | Part of the Focus Cloud Group
Cleared or Eligible for SC Clearance Your responsibilities: Deploy, configure, and monitor AWS services ensuring high availability, scalability, and security. Respond to and resolve infrastructure and service incidents with rootcauseanalysis and preventive measures. Handle change requests, track recurring issues, and work on long-term fixes to improve system stability. Implement and maintain observability solutions using … configuration and deployment management experience with CI/CD Desirable skills Hands-on experience with Terraform or CloudFormation for infrastructure provisioning and automation. Strong knowledge of Splunk for log analysis and troubleshooting. Strong problem-solving skills and analytical thinking. More ❯
and compliance requirements. • Act as the primary point of contact for internal business units (including Operations, Compliance & Transactional Banking), IT and external vendors, regarding service performance and enhancements. • Lead rootcauseanalysis and resolution of major incidents. Drive problem management to reduce recurring issues and improve service stability. • Manage projects involving any future enhancements or regulatory changes More ❯
South East London, England, United Kingdom Hybrid / WFH Options
Explore Group
and scale Kubernetes clusters hosting critical microservices Design and enhance observability, alerting, and incident response processes Collaborate closely with engineers to ensure systems are reliable, secure, and performant Lead rootcauseanalysis for production incidents and help prevent recurrence Build tooling to automate repetitive tasks and improve deployment pipelines (CI/CD) Participate in on-call rotation More ❯
on-prem environments. What Youll Be Doing: Managing and supporting Solace PubSub+ appliances and software brokers across cloud and on-prem platforms Responding to production incidents and working on rootcauseanalysis and long-term fixes Monitoring system health and performance with Prometheus, Grafana, and custom dashboards Optimising Solace across WAN environments for secure, low-latency message More ❯
prem environments. What You’ll Be Doing: Managing and supporting Solace PubSub+ appliances and software brokers across cloud and on-prem platforms Responding to production incidents and working on rootcauseanalysis and long-term fixes Monitoring system health and performance with Prometheus, Grafana, and custom dashboards Optimising Solace across WAN environments for secure, low-latency message More ❯
operational performance, and security compliance. Facilitate effective communication between IT teams and business units. Problem Solving and Incident Management: Manage and resolve high-priority incidents and critical issues. Conduct rootcauseanalysis and implement corrective actions to prevent recurrence. Develop and maintain incident response plans and procedures. Requirements: Proven experience as a Digital Operations Manager, IT Manager More ❯
a hands-on leadership role - you won’t just guide others, you’ll be the go-to expert when systems are under pressure. You'll lead incident response, own rootcauseanalysis, and solve performance issues like memory leaks, outages, and flaky services. Your focus will include : Leading incident management, post-mortems, and blameless RCAs Building scalable More ❯
East London, London, United Kingdom Hybrid / WFH Options
Owen Thomas | Pending B Corp™
and efficiency. Automate configuration, provisioning, and deployment to reduce manual effort and streamline operations. Implement and uphold security standards, including encryption, access control, and compliance. Lead incident response and rootcauseanalysis, applying preventive measures to avoid recurrence. Collaborate across teams (QA, DevOps, IT) to troubleshoot and enhance system performance. Maintain clear documentation for configurations, procedures, and … with a focus on Python. Skilled in TDD and BDD, primarily using Python. Deep understanding of distributed systems, networking, storage, and compute management. Strong troubleshooting skills, with experience in rootcauseanalysis and timely resolution. Knowledge of security standards (ISO27001, NIST, GDPR) and infrastructure security best practices. Experienced with monitoring/logging tools like Splunk, Grafana, and More ❯
South East London, England, United Kingdom Hybrid / WFH Options
Owen Thomas | Pending B Corp™
and efficiency. Automate configuration, provisioning, and deployment to reduce manual effort and streamline operations. Implement and uphold security standards, including encryption, access control, and compliance. Lead incident response and rootcauseanalysis, applying preventive measures to avoid recurrence. Collaborate across teams (QA, DevOps, IT) to troubleshoot and enhance system performance. Maintain clear documentation for configurations, procedures, and … with a focus on Python. Skilled in TDD and BDD, primarily using Python. Deep understanding of distributed systems, networking, storage, and compute management. Strong troubleshooting skills, with experience in rootcauseanalysis and timely resolution. Knowledge of security standards (ISO27001, NIST, GDPR) and infrastructure security best practices. Experienced with monitoring/logging tools like Splunk, Grafana, and More ❯
join a small, agile team responsible for both functional and technical aspects of application support, working across two instances of Murex (v2.11 and v3.1). This includes incident resolution, rootcauseanalysis, server-side troubleshooting, and engagement with global teams on system enhancements and migrations—including the move toward Murex Cloud (MX.3 SaaS) . Key Responsibilities: Provide … functional, and infrastructure-related issues Support the full trade lifecycle, including pricing, PnL, and risk modules Collaborate directly with front-office traders and business stakeholders Perform application restarts, log analysis, and server-level diagnostics Participate in weekend and on-call support (ad hoc, with future move to rota) Contribute to automation and tooling improvements Required Skills & Experience: Must have More ❯
Ansible, AWS RDS/Aurora tools, Azure SQL automation). Monitoring & Health Checks: Utilize tools such as CloudWatch, Azure Monitor, OEM, or Prometheus to monitor performance and availability. Troubleshooting & RootCauseAnalysis: Diagnose and resolve database incidents; conduct RCAs for critical incidents and outages. Collaboration: Work closely with DevOps, Application, and Security teams for seamless integration, monitoring More ❯
handling use cases. Collaborative Problem Solving & Decision-Making : Comfortable navigating ambiguity, managing risk, and facilitating decision-making with technical and non-technical stakeholders. Applies structured approaches (e.g., A3 thinking, rootcauseanalysis, DACI/RAPID) to resolve blockers and drive clarity. Executive Communication & Stakeholder Alignment : Skilled communicator who can tailor narrative and influence across senior leadership, technical More ❯
and deploy automated hardware test setups for board-level and silicon validation, integrating scripting and measurement tools (e.g., Python, LabVIEW, or similar). Lead signal integrity and power integrity analysis, including simulation and measurement (e.g., eye diagrams, S-parameters, crosstalk analysis). Collaborate with IC design and layout teams to define I/O requirements, package constraints, and … review layouts for impedance control, routing constraints, and noise mitigation. Work closely with FPGA and embedded software teams to support interface validation and debugging at the system level. Drive rootcauseanalysis for signal failures, timing errors, and system-level integration issues using advanced lab instrumentation (oscilloscopes, logic analysers, BERTs, VNAs). Contribute to test plan development … validation coverage analysis, and reporting for product verification and manufacturing readiness. Document technical designs, validation results, and bring-up procedures for internal stakeholders and product teams. Skills & Experience 7+ years of experience in high-speed digital electronics design, including interfaces such as PCIe, DDR4/5, Ethernet, and SERDES. Strong understanding of signal integrity (SI) and power integrity (PI More ❯
services are running smoothly. Address any overnight alerts or incidents reported by monitoring tools. Incident Management: Respond to and troubleshoot issues reported by users or automated monitoring systems. Perform rootcauseanalysis and implement fixes to resolve incidents promptly. Document incidents and resolutions for future reference. System Maintenance: Apply patches and updates to applications and databases to More ❯
South East London, England, United Kingdom Hybrid / WFH Options
Compass Associates
fast-paced, growing organisations Excellent communication, influencing, and stakeholder management skills Data-driven at heart, with strong quality improvement and analytical capability Skilled in policy writing, audit methodologies, and root-causeanalysis Strong project-management skills and organised approach UK work authorisation Desirable: Experience in digital health/health technology Familiarity with ISO 13485 or ISO More ❯
problems, and change tracking. Understanding of airline schedule disruptions and system impacts during IROPs (irregular ops). Experience supporting international airport environments or multi-airline terminals. Ability to perform rootcauseanalysis and contribute to problem management. Basic scripting or automation (e.g. PowerShell, batch scripts) for system checks/log extraction. Awareness of aviation security protocols and More ❯
next-gen media infrastructure. Key Responsibilities: Provide technical support for proprietary media software systems used by a major broadcast customer Respond quickly to platform incidents, update stakeholders, and support root-causeanalysis Support the installation and configuration of video delivery components in both lab and production environments Participate in new deployments and contribute to knowledge sharing across More ❯
including training and development Oversee PPM delivery, asset management systems, and emergency procedures Act as a primary point of contact for client liaison and technical reporting Support incident response, rootcauseanalysis, and continuous improvement initiatives Candidate Profile: Minimum of five years' experience in a critical environment or data centre operations High-voltage authorised person (HVAP) status More ❯
South East London, England, United Kingdom Hybrid / WFH Options
La Fosse
the design, development, and maintenance of the test automation framework. Write, review, and maintain high-quality automated test scripts. Promote and implement automation best practices and industry standards. Drive rootcauseanalysis processes to minimise time-to-resolution for defects. Requirements: Strong and proven experience in software quality assurance, test automation, and QA leadership. Proficiency in C# More ❯
AVS, JVS, OpenComponents, and .NET/Java-based extensions. Develop scripts, reports, and workflows across the trade lifecycle—from trade capture to settlement and accounting. Optimize system performance, conduct rootcauseanalysis, and resolve production issues. Participate in system upgrades, patch management, and regression testing activities. Ensure coding standards, documentation, and best practices are followed throughout the More ❯
annual external CASS audit. Develop and maintain appropriate and effective MI and KPIs for CASS Compliance. Ensure CASS breaches are appropriately recorded, reported and escalated to senior management and rootcauseanalysis is conducted to prevent future recurrence. Advise on CASS risk and assist in Risk and Control Self-Assessments (RCSAs) and Internal Capital Adequacy and Risk More ❯
detailed internal documentation relating to Workday processes, configurations, and automated workflows to support transparency and future scalability. Lead resolution efforts for complex system issues, applying advanced troubleshooting methods and rootcauseanalysis to ensure lasting solutions and service reliability. Key Requirements: Proven experience aligning Workday functionality with broader organisational and IT strategies by gathering business requirements and … and maintaining comprehensive technical documentation, process maps, and support materials to guide internal teams and end users. Capable of diagnosing and resolving complex Workday-related issues, performing in-depth rootcauseanalysis, and driving continuous service enhancement. This position offers a unique opportunity to play a critical role in the organisation’s ERP system strategy. If you More ❯
South East London, England, United Kingdom Hybrid / WFH Options
Travelex
as well upskilling teams to self serve future or custom reports The role requires strong understanding of Reporting tools including OTBI, BI publisher and Smart View along with data analysis, report optimization and data governance The role will need to understand the fundamentals of all aspects of complex multi-currency, multi-company accounting systems and maintain a high level … getting better data insights Engage and guide a number of Finance stakeholders through the process to ensure the appropriate strategic direction is well understood and documented. Provides expert financial analysis and recommendations to senior management on business decisions related to REPORTING, and the strategic direction as required. Provide training and support to help build out a new Shared Services … build out of processes, procedures, and that appropriate resources are in place and tested to ensure services are delivered to agreed service levels. Ensure team undertake proactive troubleshooting and rootcauseanalysis of issues implementing preventative measures as appropriate.· Reviews and approves Change Requests with impact in the REPORTING space. Provide leadership and support to testing activities More ❯
South East London, England, United Kingdom Hybrid / WFH Options
ECS Resource Group
Leading solution presentations and technical discussions with stakeholders Delivering seamless integration across the full tech stack — from network to contact centre Supporting 3rd/4th line incidents and driving rootcauseanalysis Shaping and enforcing network standards and best practices Why Join My Client? Work for a forward-thinking, vendor-neutral tech leader Own projects from design More ❯