and Grafana . Key Responsibilities: Administer and maintain Solace PubSub+ appliances and software brokers across environments (on-prem and cloud). Provide production support for messaging-related incidents, including rootcauseanalysis and resolution. Monitor system performance and health using Prometheus and Grafana ; proactively identify and address anomalies. Configure and optimize Solace across WAN environments , ensuring low More ❯
london (city of london), south east england, united kingdom
BGC Group
and Grafana . Key Responsibilities: Administer and maintain Solace PubSub+ appliances and software brokers across environments (on-prem and cloud). Provide production support for messaging-related incidents, including rootcauseanalysis and resolution. Monitor system performance and health using Prometheus and Grafana ; proactively identify and address anomalies. Configure and optimize Solace across WAN environments , ensuring low More ❯
and Grafana . Key Responsibilities: Administer and maintain Solace PubSub+ appliances and software brokers across environments (on-prem and cloud). Provide production support for messaging-related incidents, including rootcauseanalysis and resolution. Monitor system performance and health using Prometheus and Grafana ; proactively identify and address anomalies. Configure and optimize Solace across WAN environments , ensuring low More ❯
be able to multitask in a dynamic team-based environment demonstrating strong problem-solving and decision-making abilities and the highest degree of professionalism. Responsibilities Lead incident response efforts, rootcauseanalysis, and post-incident reviews for critical system issues to help ensure production reliability. Establish escalation procedures and ensure timely resolution of system outages or performance More ❯
Leeds, England, United Kingdom Hybrid / WFH Options
Babcock
is delivered it is at the highest possible standard Responsible for ensuring that all relevant process is effectively documented and regularly reviewed Responsible for providing well-reasoned and sound analysis, context and predictions into relevant deliverables Responsible for assessing the maturity of the function within the client and identifying areas for improvement, productising those improvements and delivering them Be … a point of contact for intrusion analysis, forensics and Incident Response queries. Able to provide rootcauseanalysis of non-standard analytic findings and anomaly detections for which a playbook does not yet exist. Responsible for ensuring that during times of reduced capacity that all ADHOC and regular products are completed and are at a sufficient More ❯
structured and unstructured data from banking systems and external sources. Support the development of machine learning models for fraud detection and AML monitoring. Conduct data profiling, data mapping, and rootcauseanalysis to ensure data accuracy and integrity in FinCrime investigations. Work closely with regulatory teams to ensure compliance with AML, KYC, and other financial crime regulations. More ❯
London, England, United Kingdom Hybrid / WFH Options
First State Bank - Michigan
must have) and other languages i.e Java, JavaScript, Python, etc . Solid understanding of web technologies, protocols , and microservices. Experience with cloud technologies like AWS or Azure. Expertise in rootcauseanalysis and integrating testing within development pipelines. Knowledge of version control systems and Agile methodologies. Have experience in Azure DevOps for test and defect management . More ❯
monitor systems, using advanced toolsets, to prevent security breaches and to respond to incidents as they arise. Day to day your role will involve: Performing advanced real-time SIEM analysis and correlation of logs/alerts from a multitude of client devices. Determining if events escalated by the SOC analyst team constitute security incidents, and if they do, you … and tune SIEM rules. You will identify and implement parsing configuration as required to optimise log source configuration, aiding investigation efforts. Analysing and assessing security incidents, performing in depth rootcause analyses and advancing to client resources or collaborating with internal teams for additional assistance Acting as subject matter expert, investigating security events forwarded from clients. Act as … environment Working knowledge in the following areas: Unix, Linux, Windows, etc. operating systems MITRE ATT&CK Framework Exploits, vulnerabilities, network attacks Networking concepts/understanding of networking protocols. Packet analysis tools (tcpdump, Wireshark, ngrep, etc.) Keen problem solving/troubleshooting skills Strong analytical skills and a logical approach to resolving issues Act as an escalation point to junior members More ❯
Gloucester, England, United Kingdom Hybrid / WFH Options
LM RECRUITMENT SOLUTIONS LTD
with business objectives. Incident and Problem Management: Serve as the primary escalation point for complex technical issues and major incidents. Coordinate rapid resolution of critical infrastructure failures and ensure rootcauseanalysis is performed to prevent future occurrences. Security and Compliance: Ensure that all IT operations adhere to industry standards and regulatory requirements, including ISO 27001, GDPR More ❯
customer review meetings via Microsoft Teams to review open customer tickets, assist with resolution of the issues raised and continuously improve our communications and engagement with our customers. Regular analysis of customer issues to identify and highlight common trends and underlying problems so this can be fed back to the Production and Management Team as RootCauseAnalysis intelligence. Support our Business Development Team and Account Managers in identifying and highlighting changes requested by our customers and therefore possible sales opportunities. Training, guidance, coaching and mentoring for Enterprise Systems colleagues. Other Contribute to the improvement of existing systems and processes used by CACI, in conjunction with the team, Department Head and Director. Essential Key Attributes More ❯
or Milton Keynes). The successful candidate will join our shared delivery team supporting multiple customers, providing proactive and reactive support of the estate management, including resolution of incidents, rootcauseanalysis, and completion of change requests. The roles will focus on incidents, increasing the first-time fix rates, ensuring SLA’s are met, and wider call More ❯
compliance requirements Solid understanding of system security relating to system-based infrastructure and configuration Demonstrable ability to plan, organize, and coordinate resources, timelines and deliverables Demonstrated ability to perform rootcauseanalysis and propose effective solutions for continuous improvement Validated ability to bridge the gap between technical and non-technical stakeholders to influence decisions Ability to work More ❯
partners. Ensure Code Quality: Uphold best practices in version control, documentation, and peer review to maintain high standards. Troubleshoot and Improve Data Quality: Resolve complex data issues by identifying root causes and implementing lasting improvements. Develop Integrations and Applications: Create and maintain back-end APIs, automate data ingestion, and build front-end tools (Astro, React, or low-code) for … reviews, providing constructive feedback to colleagues and championing improvements in style, performance, and security Investigate and Improve Data Quality Proactively identify data anomalies, inconsistencies, and integrity issues by: Conducting root-causeanalysis on recurring data errors or mismatches Reviewing source data feeds and transformation logic to pinpoint upstream issues Recommend and implement fixes and improvements to existing … as a Technical Escalation Point Act as tier 3 support when the service desk or level-2 teams cannot resolve incidents due to complexity: Triage incoming support tickets, identify root causes, and propose corrective actions Provide on-call availability for critical outages affecting data pipelines, integrations, or production environments Document incident resolutions to streamline future troubleshooting efforts Plan, Specify More ❯
cloud subject matter expert, providing AWS best practice guidance to internal teams and project stakeholders. Investigate and resolve AWS infrastructure-related incidents, ensuring minimal downtime and impact. Participate in rootcauseanalysis and implement preventative measures. Maintain clear, detailed documentation for AWS environments, architecture diagrams, SOPs, and runbooks. Continuously look for opportunities to improve cloud architecture, security More ❯
cloud subject matter expert, providing AWS best practice guidance to internal teams and project stakeholders. Investigate and resolve AWS infrastructure-related incidents, ensuring minimal downtime and impact. Participate in rootcauseanalysis and implement preventative measures. Maintain clear, detailed documentation for AWS environments, architecture diagrams, SOPs, and runbooks. Continuously look for opportunities to improve cloud architecture, security More ❯
the ability to convey complex technical issues to non-technical individuals in a clear and understandable manner. Strong analytical skills: The candidate should have strong analytical abilities to perform root-causeanalysis and structured problem-solving. Experience in a fast-paced or stressful environment: Previous experience working in a fast-paced or stressful environment can be advantageous. More ❯
solutions on AWS, ensuring scalability, reliability, and security. Collaborate with cross-functional teams to understand requirements, develop solutions, and deliver high-quality software solutions. Troubleshoot and debug issues, perform rootcauseanalysis, and implement effective solutions. Write clean, efficient, and maintainable code in production following best practices and coding standards, such as Test Driven Development and implementing More ❯
usage to ensure critical jobs are completed on time and SLA's are met Maintain and upgrade the firm's business critical applications and infrastructure. Perform incident management and rootcauseanalysis owning incidents from when they are reported to resolution Oversee the transition of new applications into application support ensuring complete user documentation, support tools and More ❯
and maintain knowledge base articles to improve service delivery. • Use knowledge management tools to share resolutions and prevent recurrence of known issues. • Identify trends in incidents and assist in root-causeanalysis investigations in line with Problem Management processes. • Support the Major Incident Management group during high priority incidents via effective triage, troubleshooting whilst ensuring minimal service More ❯
London, England, United Kingdom Hybrid / WFH Options
Aker Systems Limited
and guide logical and physical database design, data modelling, and performance tuning. Collaborate closely with cross-functional teams including data scientists, analysts, DevOps, and software engineers. Troubleshoot and perform rootcauseanalysis of data issues and production bottlenecks. Drive best practices in code quality, testing (TDD), version control, and CI/CD across data engineering processes. Contribute More ❯
IT Service Management (ITSM) processes across all teams, ensuring standardized, efficient, and effective service delivery. EstablishSRE-based operational metrics, includingSLOs, SLIs, and error budgets. Overseeincident response, problem resolution, and rootcauseanalysis with AI-driven remediation. Ensurehigh availability, performance, and security compliancefor all enterprise services. Develop afollow-the-sun operational support model, ensuring24x7 resilience and uptime across More ❯
IT Service Management (ITSM) processes across all teams, ensuring standardized, efficient, and effective service delivery. EstablishSRE-based operational metrics, includingSLOs, SLIs, and error budgets. Overseeincident response, problem resolution, and rootcauseanalysis with AI-driven remediation. Ensurehigh availability, performance, and security compliancefor all enterprise services. Develop afollow-the-sun operational support model, ensuring24x7 resilience and uptime across More ❯
IT Service Management (ITSM) processes across all teams, ensuring standardized, efficient, and effective service delivery. EstablishSRE-based operational metrics, includingSLOs, SLIs, and error budgets. Overseeincident response, problem resolution, and rootcauseanalysis with AI-driven remediation. Ensurehigh availability, performance, and security compliancefor all enterprise services. Develop afollow-the-sun operational support model, ensuring24x7 resilience and uptime across More ❯
Manchester, England, United Kingdom Hybrid / WFH Options
Infoplus Technologies UK Limited
Design and implement automated response workflows using Sentinel playbooks (Logic Apps). - Enhance response efficiency by developing SOAR integrations across security tooling. Documentation & Reporting - Produce comprehensive incident reports and rootcause analyses. - Maintain technical documentation for use cases, configurations, response procedures, and data source onboarding. - Generate regular dashboards and reports for SOC leadership and compliance stakeholders. Essential Skills More ❯
Measure and reporting on the success and effectiveness of the IT Problem Management practice, utilising ITIL best practice KPIs and PSFs and driving continual improvement. Provide detailed reporting and analysis of problems throughout the problem lifecycle (Identification, Problem Control, Error Control). Ensure training, awareness and adherence of the IT Problem Management practice across IT. Provide training on various … rootcauseanalysis techniques, including how and when they should be applied. Chair Problem Review meetings with stakeholders, and act as a key participant in post major incident review (PMIR) and post implementation review (PIR) for change and incident management respectively. Support core IT Service Management practices, such as incident, change, service transition, and service request. Required … ITIL best practice guidelines. Extensive experience working with ServiceNow, specifically for IT Problem Management . Knowledge of core ITSM practices, including incident, change, service request. Excellent understanding of different RootCauseAnalysis techniques, and how/when to apply. Strong analytical skills with excellent attention to detail and ability to think critically and make informed decisions. Excellent More ❯