Dashboards. Contribute to process and technical capabilities (e.g., Data Modeling, Data Visualizations, Artificial Intelligence (AI), Machine Learning (ML to enhance identification of service improvement opportunities. Complex data mining, trend analysis, metric and report production will be required. Identify and review service improvement opportunities with stakeholders based on TR enterprise-wide performance metrics. Proactive collaboration with stakeholders to create and … improvement initiatives. Be responsive to internal stakeholder needs and engage with stakeholders across multiple functions. Typical daily work may include but is not limited to complex data mining, trend analysis, metric and report production, process flow charting, and iterative service improvement activities (e.g. daily standups, data quality checks, change reviews, tool enhancement design and review). Contribute to proactive … enhance service reliability and availability. Support for Service Management activities to ensure a consistent standard of incident, problem, change and other practice areas for enhanced accuracy of data quality, rootcauseanalysis and identification of preventative measures. Support the recurring service performance reporting cycle (e.g., weekly, monthly, quarterly). About You: Experience in enterprise problem management, application More ❯
maintain systems according to approved design. Service Delivery & Operations: Lead key service management processes (Continuity, Capacity, Availability). Attend incident/problem bridges as the subject matter expert. Review rootcause analyses (RCAs) and oversee corrective actions. Provide accurate monthly service performance reports across IT and OT. Supplier & Financial Management: Lead and manage suppliers to meet agreed SLAs … change management experience. Ability to simplify complex network architecture for non-technical audiences. Desirable Technical Skills & Qualifications: Knowledge of network security technologies and strategic supplier management. Experience in stakeholder analysis and business case development. Familiarity with cloud integration (Azure and AWS). What's in it for you? Competitive salary up to £75,000 per annum, depending on experience More ❯
application support strategies Key Responsibilities: Own Application Support Lifecycle: Ensure end-to-end support for critical business applications, meeting SLAs and availability targets. Incident & Problem Management: Lead resolution and rootcauseanalysis for all Retail application incidents, including major (P1/P2) issues. Escalation & Crisis Leadership: Act as the escalation point for major incidents and provide direction … containerization experience with Azure , Docker , and AKS . Familiarity with modern web technologies, including React , REST APIs , and SOAP architectures. Skilled in managing P1/P2 incidents , business impact analysis, rootcause investigations, and change coordination. Strong grasp of IT service management practices; ITIL v4 certification or equivalent preferred. Proactive Monitoring : Hands-on experience with tools like More ❯
within internal andexternal service operations. Requirements Key Responsibilities Incident and ServiceManagement Act as the escalation point for complex incidentsand service requests, ensuring timely resolution in accordance with agreedSLAs. Perform root-causeanalysis and drive resolution ofrecurring problems. Monitor service delivery performance, proactivelyidentifying potential disruptions and coordinating corrective actions. Technical Support and Analysis Provide advanced technical support … Incident, Problem, Change, and Service Level Management. Demonstrable experience using ITSM tools (e.g.,ServiceNow, Zendesk, Jira Service Management, Freshdesk). Excellent analytical and problem-solving skills,with experience conducting root-cause analyses and recommending effectivesolutions. Strong technical background with proficiency incommon enterprise technologies (Microsoft 365, Azure/AWS, networktroubleshooting, databases, application support). Outstanding communication and interpersonal skills More ❯
systems to detect data anomalies, system failures, and performance issues and leverage advanced scripting and orchestration tools (e.g., Python, Bash, Apache Airflow) to automate workflows and reduce operational overhead. RootCauseAnalysis & Incident Management: Lead post-incident reviews, perform rootcauseanalysis for data disruptions, and implement corrective actions, while creating detailed reports and More ❯
availability Develop common framework components (to be leveraged by enterprise applications), define standards for configuration, monitoring, reliability, and performance engineering Work with Technology teams to resolve major incidents Conduct rootcauseanalysis (RCA) for incidents and implement preventive measures. Define and monitor Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets. Continuously improve automated remediation … Managers (GTMs), Local Traffic Managers (LTMs) Hands on experience on configuring Splunk, Grafana dashboards, Kibana, Elasta alerts etc. Working experience on network rules creation, load balancer configurations, network packet analysis Analytical knowledge and exposure on rootcause identification using analyzer tools like IBM support assistant, Splunk etc. Good understanding of Linux OS internals, performance tools, Core commands More ❯
Knowledge Management: Maintain up-to-date technical documentation, including API/interface catalogues, data flow diagrams, environment runbooks, and integration design patterns Incident and Service Request Administration: Assist in rootcauseanalysis for integration-related issues, serving as the primary point of contact for documenting, triaging, and coordinating the resolution of incidents and service requests. Change Coordination … a conduit between the development team and project teams to ensure consistent, transparent, and professional communication Education and Experience: Bachelor's degree in computer science, information-technology, engineering, system analysis or a related study, or equivalent experience A minimum of three years in a technology-related capacity with direct exposure to software development or IT project environments. At least More ❯
best practices, cloud strategies, and platform engineering. Team Leadership: Guide and coach, a team of engineers, technical specialists, and architects, encouraging the adoption of innovative technologies and practices. Technical Analysis:Lead technical analysis and estimation efforts for custom-built applications. Best Practices:Drive the adoption of release management and automation best practices. Incident Management:Ensure thorough rootcauseanalysis and prompt remediation during any incidents or outages. Vendor Coordination:Work with external vendors to supplement team capacity and expertise when necessary. YOU'RE GOOD AT You bring solid development and program leadership experience to drive technical governance, innovation, integrations, and cloud strategies using emerging technologies like Gen AI. You thrive in environments that demand More ❯
base articles. Monitor application health using tools and custom dashboards. Support integration and communication between cloud platforms (Azure, Entra ID, Microsoft 365). Contribute to service improvement initiatives, including rootcauseanalysis and automation opportunities. Participate in on-call rotations or after-hours incidents during peak retail periods. Work within established security frameworks and governance. Hybrid working More ❯
business - Build datasets, metrics, and KPIs supporting business - Design and develop highly available dashboards and metrics using SQL and Excel/Quicksight or other BI reporting tools - Perform business analysis and data queries using scripting languages like R, Python etc - Design, implement and support end-to-end analytical solutions that are highly available, reliable, secure, and scale economically - Collaborate … cross-functionally to recognize and help adopt best practices in reporting and analysis, data integrity, test design, analysis, validation, and documentation - Proactively identify problems and opportunities and perform rootcauseanalysis/diagnosis leading to significant business impact - Work closely with internal stakeholders such as Operations, Program Managers, Workforce, Capacity planning, machine learning, finance teams … Excel - 5+ years using data visualization tools like Tableau, Quicksight or similar tools - Experience with R, Python or other statistical/machine learning tools - Experience demonstrating problem solving and rootcauseanalysis - Experience using databases with a large-scale data set - Bachelor's degree in engineering, analytics, mathematics, statistics or a related technical or quantitative field - Detail More ❯
configuring, updating, and monitoring security tools and software, such as antivirus, encryption, authentication, SIEM etc. Evaluate, research and manage emerging cyber security threats. Support the incident management process, through RootCause Analysis. Responding to and resolving security incidents and events, such as malware infections, phishing attempts, denial-of-service attacks, data breaches, etc. Liaise with stakeholders in relation … Exposure to security monitoring technologies Understanding of Incident Response, Cyber Kill Chain, ATT&CK · Knowledge & experience of common program language e.g., Python, C++, PowerShell, JavaScript Being able to perform RootCauseAnalysis Experience with vulnerability assessments Ability to discover, design and document security implementations. Strong networking skills. Good understanding of securing Cloud technologies through native and multi More ❯
City of London, Greater London, UK Hybrid / WFH Options
Infinigate Group
configuring, updating, and monitoring security tools and software, such as antivirus, encryption, authentication, SIEM etc. Evaluate, research and manage emerging cyber security threats. Support the incident management process, through RootCause Analysis. Responding to and resolving security incidents and events, such as malware infections, phishing attempts, denial-of-service attacks, data breaches, etc. Liaise with stakeholders in relation … Exposure to security monitoring technologies Understanding of Incident Response, Cyber Kill Chain, ATT&CK · Knowledge & experience of common program language e.g., Python, C++, PowerShell, JavaScript Being able to perform RootCauseAnalysis Experience with vulnerability assessments Ability to discover, design and document security implementations. Strong networking skills. Good understanding of securing Cloud technologies through native and multi More ❯
Modeling Develop and implement sophisticated statistical models and machine learning algorithms to forecast trends, predict outcomes, and identify opportunities for performance enhancement. Utilize advanced analytics techniques such as regression analysis, time series forecasting, and clustering to extract deeper insights from multifaceted datasets. Design and execute A/B tests to optimize strategies and validate hypotheses. Strategic Performance Analysis and Optimization Conduct in-depth analysis of KPIs, benchmarking against industry standards and historical performance. Perform multi-dimensional analysis to uncover hidden patterns and correlations in client data. Develop and maintain a comprehensive performance measurement framework, aligning metrics with client's strategic objectives. Lead rootcause analyses for complex performance issues, proposing data-driven solutions. More ❯
and Grafana . Key Responsibilities: Administer and maintain Solace PubSub+ appliances and software brokers across environments (on-prem and cloud). Provide production support for messaging-related incidents, including rootcauseanalysis and resolution. Monitor system performance and health using Prometheus and Grafana ; proactively identify and address anomalies. Configure and optimize Solace across WAN environments , ensuring low More ❯
and Grafana . Key Responsibilities: Administer and maintain Solace PubSub+ appliances and software brokers across environments (on-prem and cloud). Provide production support for messaging-related incidents, including rootcauseanalysis and resolution. Monitor system performance and health using Prometheus and Grafana ; proactively identify and address anomalies. Configure and optimize Solace across WAN environments , ensuring low More ❯
IT Service Management (ITSM) processes across all teams, ensuring standardized, efficient, and effective service delivery. EstablishSRE-based operational metrics, includingSLOs, SLIs, and error budgets. Overseeincident response, problem resolution, and rootcauseanalysis with AI-driven remediation. Ensurehigh availability, performance, and security compliancefor all enterprise services. Develop afollow-the-sun operational support model, ensuring24x7 resilience and uptime across More ❯
Description We are seeking a knowledgeable Application Support Analyst to liaise with vendors, business users and product teams to perform installations, identify route cause and deliver fixnhancements. The candidate would ideally also have knowledge in commodity trading and will be delivery focused. Knowledgeable in working with Agile (SCRUM) development and delivery teams is advantageous. The ideal candidate disposes of … ability to develop innovative solutions to technical problems whilst working within the company's governance framework. Incident management skills leading & owning the issues from start to resolution leading to rootcause analysis. Problem management skills in regular checks & follow ups on defects & bug arising from incidents. An adaptable attitude, with the ability to multi-task and respond quickly More ❯
hands-on role supporting high-availability systems, rapid deployments, and production incident response. Key Responsibilities - Manage and monitor AWS infrastructure for performance and security - Respond to production incidents, perform rootcauseanalysis, and implement fixes - Maintain observability tools (Prometheus, Grafana, Splunk) and write PromQL queries - Improve and operate CI/CD pipelines using GitHub Actions and Kubernetes … Prometheus, Grafana, Splunk, and PromQL - Proficient in scripting (Python, Go, Bash, SQL) - Skilled in GitHub, CI/CD, and Kubernetes operations Desirable: - Experience with Terraform or CloudFormation - Advanced log analysis with Splunk - Strong problem-solving and analytical thinking More ❯
Cleared or Eligible for SC Clearance Your responsibilities: Deploy, configure, and monitor AWS services ensuring high availability, scalability, and security. Respond to and resolve infrastructure and service incidents with rootcauseanalysis and preventive measures. Handle change requests, track recurring issues, and work on long-term fixes to improve system stability. Implement and maintain observability solutions using … configuration and deployment management experience with CI/CD Desirable skills Hands-on experience with Terraform or CloudFormation for infrastructure provisioning and automation. Strong knowledge of Splunk for log analysis and troubleshooting. Strong problem-solving skills and analytical thinking. More ❯
City of London, Greater London, UK Hybrid / WFH Options
Cognitive Group | Part of the Focus Cloud Group
Cleared or Eligible for SC Clearance Your responsibilities: Deploy, configure, and monitor AWS services ensuring high availability, scalability, and security. Respond to and resolve infrastructure and service incidents with rootcauseanalysis and preventive measures. Handle change requests, track recurring issues, and work on long-term fixes to improve system stability. Implement and maintain observability solutions using … configuration and deployment management experience with CI/CD Desirable skills Hands-on experience with Terraform or CloudFormation for infrastructure provisioning and automation. Strong knowledge of Splunk for log analysis and troubleshooting. Strong problem-solving skills and analytical thinking. More ❯
to-date with the latest advancements in identity management protocols and best practices. Contribute to the development and documentation of technical specifications and design decisions. Troubleshoot technical issues, conduct rootcauseanalysis, and implement timely resolutions to minimize downtime. Qualifications: Bachelor's or Master's degree in Computer Science, Engineering, or related field. Minimum 5+ years of More ❯
checks to identify process defects Reporting Support the creation of routine reporting packs and dashboards for internal stakeholders, utilising and defining performance metrics - Service Level Agreements (SLAs) etc Conduct Analysis utilising tools such as Excel or PowerBI, to identify trends and opportunities for both system optimisation and improvement in operational performance Continuous Improvement - Operations process optimisation Proactively identify opportunities … generating and maintaining a knowledgeable Problem Solving Critically assess and collaboratively work alongside the function's operations team, managed service vendors and enterprise IT team to identify/support rootcauseanalysis and remediation of issues, incidents and escalation. Bridge the gap by translating business requirements to the Tech team and vice versa Vendor Management Maintain a More ❯
and compliance requirements. • Act as the primary point of contact for internal business units (including Operations, Compliance & Transactional Banking), IT and external vendors, regarding service performance and enhancements. • Lead rootcauseanalysis and resolution of major incidents. Drive problem management to reduce recurring issues and improve service stability. • Manage projects involving any future enhancements or regulatory changes More ❯
and compliance requirements. • Act as the primary point of contact for internal business units (including Operations, Compliance & Transactional Banking), IT and external vendors, regarding service performance and enhancements. • Lead rootcauseanalysis and resolution of major incidents. Drive problem management to reduce recurring issues and improve service stability. • Manage projects involving any future enhancements or regulatory changes More ❯
and compliance requirements. • Act as the primary point of contact for internal business units (including Operations, Compliance & Transactional Banking), IT and external vendors, regarding service performance and enhancements. • Lead rootcauseanalysis and resolution of major incidents. Drive problem management to reduce recurring issues and improve service stability. • Manage projects involving any future enhancements or regulatory changes More ❯