Product Reliability and Support Strategist, Alerting and IncidentManagement About The Position Coralogix is a modern, full-stack observability platform transforming how businesses process and understand their data. Our unique architecture powers in-stream analytics without reliance on expensive indexing or hot storage. We specialize in comprehensive monitoring of logs, metrics, traces, and security events with features such … and more, enhancing operational efficiency and reducing observability spending by up to 70%. We seek a Quality and Support Strategist professional who ensures that the Coralogix Alerting and IncidentManagement Platform and Process exceed the quality and reliability standards, establish a competitive edge, and prevent failures, profit loss, or work stoppages. You will be responsible for enhancing … customer experience by ensuring efficient and effective alert management resolution, reducing engineering interruptions, and boosting product awareness. This role involves developing a robust knowledge base, identifying common usage issues, and creating solutions that establish the Alerting and IncidentManagement Platform's capabilities in terms of performance, pains, and business use cases we deliver. Key Responsibilities: Improve Customer More ❯
and more, enhancing operational efficiency and reducing observability spending by up to 70%. We seek a Quality and Support Strategist professional who ensures that the Coralogix Alerting and IncidentManagement Platform and Process exceed the quality and reliability standards, establish a competitive edge, and prevent failures, profit loss, or work stoppages. You will be responsible for enhancing … customer experience by ensuring efficient and effective alert management resolution, reducing engineering interruptions, and boosting product awareness. This role involves developing a robust knowledge base, identifying common usage issues, and creating solutions that establish the Alerting and IncidentManagement Platform's capabilities in terms of performance, pains, and business use cases we deliver. Key Responsibilities: Improve Customer … Satisfaction Improve turnaround time to resolve customer satisfaction. Work closely with engineering and technical account managers to ensure customers can achieve their ambitions using the Coralogix Alerting and IncidentManagement Platform. Sometimes, these solutions involve impromptu solutions by keeping one eye on the product roadmap. Reduce Engineering Interruptions Identify common problems and work with Technical Product ManagementMore ❯
a Great Place to Work for the last five consecutive years. Sound good? Read on to find out more about joining our team KEY RESPONSIBILITIES Manage end-to-end incident resolution: triage, classification, escalation, risk assessment, and timely closure of actions. During incident resolution, place special emphasis on assessing potential or actual harm caused to customers - including those … with vulnerabilities - and ensure remediation actions align with Consumer Duty requirements. Lead and coordinate incident response calls, including chairing the Customer IncidentManagement Process committee and acting as the key escalation point. Collaborate with Risk & Compliance stakeholders on incident identification, categorisation, scoring, and mitigation strategies. Communicate the status and impact of incidents clearly to all relevant … stakeholders, working in collaboration with them to resolve and close down actions. Conduct thorough root cause analysis, host post-incident reviews, and ensure implementation and tracking of post-incident remedial actions for accountable stakeholders. Develop, maintain, and continuously improve the effectiveness of the customer incidentmanagement framework, ensuring business-wide compliance with internal policies and regulatory More ❯
Deskside Engineer and Service Management - Central London page is loaded Deskside Engineer and Service Management - Central London Apply locations GBR - ENG - LONDON time type Full time posted on Posted 2 Days Ago job requisition id Job Description: Deskside Engineer and Service Management - Job Roles, Specification and Responsibilities. This roleprovides a User Provisioned Desktop based Operational Service & Support … will have first/second line support background, and have good customer service and communication skills along with methodical troubleshooting techniques. The role will also involve areas of Service Management so experience in any of Risk Management, Monthly Reporting, IncidentManagement and Knowledge Management is desirable Please not e, d ue to the customer requirements … Business, or related field or equivalent work experience. May hold entry-level or intermediate-level certification(s) in work field. Typically 2-5 years of relevant experience. Desirable Service Management Skills: Monthly Reporting Risk ManagementIncidentManagement Knowledge Management Essential Knowledge and Skills Required: Windows 10/11 Active Directory DNS DHCP Windows Server More ❯
Barclays Resilience are hiring a new Head of Resilience Board and Self-Assessment Reporting. This role is is critical to ensuring the Board and senior management are informed on the firm's resilience risk position and remediations plans. The role holder will own the development and delivery of key reporting such as the board approved resilience self-assessment, a … internal reporting Design of controls, KIs and metrics Knowledge of Resilience Purpose of the role To develop, implement, and maintain an effective resilience strategy and Operational Recovery Planning and IncidentManagement framework aligned to industry leading standards and controls and regulatory expectations, to enable the bank to respond and recover important business services to severe but plausible scenarios … firm (e.g. SMRs and delegates, business/function resilience role owners and all colleagues), ensuring complaince to the standard and controls. Development and enhancement of the resilience, crisis and incidentmanagement framework to uplift recovery and response capabilities to ensure Important Business Services can continue to operate during disruption events to minimise the impact on customers, clients and More ❯
compliance with industry standards. This position is ideal for someone who thrives in a collaborative environment and is passionate about optimising cloud infrastructure and automation processes. Key Responsibilities Infrastructure Management: Design, implement, and manage infrastructure using Terraform for scalable and secure provisioning. Compliance & Security: Ensure platform provisioning aligns with ISO and SOC compliance standards while collaborating with the InfoSec … manage containerised applications using Docker and other orchestration tools. Observability & Monitoring: Provision and maintain observability platforms such as DataDog, Splunk, or New Relic to gain monitoring and performance insights. IncidentManagement: Establish and oversee monitoring and incidentmanagement processes to ensure system reliability. Workflow Development: Develop and implement workflows that align with an incidentmanagement plan. CI/CD Pipeline Management: Manage and optimise CI/CD pipelines using tools like GitHub Actions, Travis , and other automation frameworks. Site Reliability Engineering (SRE): Perform SRE duties to ensure system availability, performance, and scalability. Application Support: Work closely with application teams to support application deployment and performance monitoring . Cloud Administration: Administer and optimise AWS More ❯
workloads. You will be responsible for building and executing enablement plans in conjunction with the Cloud Operations Architect (a.k.a. Technical Account Manager, COA/TAM) for services such as Incident Detection and Recovery (IDR), Countdown Premium (CDP) and AWS Managed Services. You will orchestrate resources from across Global Services and managing the delivery of multiple services and resources to … outcomes Orchestrate collaboration between Global Services resources and customer stakeholders to drive best practice adoption Monitor and analyse key metrics to identify areas for continued customer improvement. Implement integrated incidentmanagement with AWS-monitored infrastructure and pre-agreed runbooks Drive actions from TAM/COA-led reviews of architectural, observability, resilience, and problem management gaps for their … areas for service improvements BASIC QUALIFICATIONS - 7+ years of experience in running large scale, enterprise-level service delivery of critical workloads with a strong emphasis on business conversations, account management, or technical program management - Strong verbal and written communication skills with ability to influence senior technical stakeholders - Understanding of incidentmanagement, problem resolution processes, IT Operations More ❯
business understanding in the IS team. Deliver strategic transformation program(s) to align and automate our processes and enable efficiency. Working with business teams set up and deliver change management enabling our colleagues to adopt modern tools and new processes. Contribute significantly to IS strategic planning, with a focus on Corporate, HR and Finance systems roadmaps, in close collaboration … and Finance stakeholders and her team to operate effective governance of roadmaps, systems, vendors and projects. You'll need Degree/Masters degree - or equivalent experience Experience of systems management experience. Experience of team management. Project management, program management, systems design and development experience, SDLC (software development lifecycle) experience. Experience in understanding and transforming business processes and … managing requirements, preferably in a Corporate, HR and Finance context Areas of knowledge & level (basic-medium-advanced): Advanced knowledge of team management, stakeholder management and governance Advanced knowledge of systems management, software development, supplier relationship management, incidentmanagement and project management Advanced understanding and experience of Finance and HR Good understanding of information More ❯
To ensure the effective day-to-day delivery of IT services across both shared and in-house environments, with a strong focus on ITIL disciplines - Incident, Problem, Change, and Asset Management - and Cyber Security risk management . The postholder will act as the customer-facing lead for operational IT, ensuring services are reliable, secure, and responsive to … user needs. Key Responsibilities Manage the performance and quality of outsourced, shared service and in-house IT services Own and operate the ITIL processes: Incident , Problem , Change , and Asset Management Act as the operational interface with the other organisation (the shared service provider) Maintain the CMDB and configuration item lifecycle tracking Lead on cyber security risk assessments , patch … assurance , vulnerability management , and coordination with SOC/XDR providers Ensure compliance with cyber and data protection standards (e.g. Cyber Essentials) Monitor service level agreements (SLAs), escalate issues, and lead service reviews Coordinate change activity to ensure minimal business disruption Deliver customer-focused service improvement initiatives Support audits and business continuity planning Essential Skills and Experience Demonstrable experience managing More ❯
specifically responsible for completing the implementation of a number of strategic based security solutions for new security tooling or existing. The engineer will also participate in security related service management processes (incident, change and problem management) and will participate in the planning, design, enforcement and review of security controls which protect the integrity of the firm. Essential … security-by-design principles into development processes. Conduct reviews of existing tools and processes, identifying gaps and implementing enhancements to strengthen our security posture. Perform security scanning and vulnerability management, taking proactive measures to reduce operational risks. Monitor security alerts and implement mitigations to safeguard against potential threats and attacks. Support Data Loss Prevention (DLP) solutions that protect corporate … data across platforms, devices, and environments globally. Monitoring and managing responses to the Security Incidents and Security DLP. Standard, third party and privilege Identity Access Management Operate, manage and improve HSM key management infrastructure. Remediation of external, internal vulnerabilities, web application scanning and patch compliance. Cyber IncidentManagement and or Security Forensic experience. Documenting High Low More ❯
reduce manual workload, improving overall efficiency. Enhance Monitoring Tools : Improve tools for monitoring and mitigating site incidents, and conduct reliability audits and tests to strengthen eBay's reliability and incidentmanagement capabilities. IncidentManagement : Act as Incident Commander to drive resolution of major incidents, manage alarms, and ensure effective communication with leadership and partner teams. … experience in large-scale internet/server environments, including cloud computing and multi-tier architectures. Experience with delivering solutions with software engineering skills including Java, Python, GO, etc Strong incidentmanagement and leadership skills, with excellent technical triage and troubleshooting abilities, especially during crises. Expert knowledge in large-scale web operations, including web-based Java/J2EE architectures More ❯
the job Job summary Within the Digital, Data & Technology area of the NCA, as are constantly evolve and expand our capabilities we need people with strong experience of service management practices to ensure that new Services are designed to support our end users in the delivery of their functions. The quality of the services delivered by the NCA directly … privacy policy notice for details on how your data is handled.Privacy Policy Notice Your role will sit within the Service Design & Transition team and you will utilise your Service Management experience and knowledge to provide expertise to project teams, ensuring incoming services align to ITIL-based processes. Your role will involve translating the project's technical service and end … internal and/or external) functions: Will adhere to and respect Service Design & Transition processes, as well as any processes outside of the Service Design & Transition practice (e.g. project management office, enterprise design assurance - technology, change enablement. Collaborating with stakeholders to translate customer requirements, business requirements and contractual obligations into end-to-end service design: Collaborating with internal and More ❯
resolve issues and user needs. You must have very good, broad experience and knowledge of working in a technology support environment using a variety of tools to support the management and delivery of production services. You should have experience across all IT service delivery activities including service management, incidentmanagement, change management, release management, configuration management, continual service improvement and customer satisfaction as well as playing a role in ensuring the highest levels of operational service delivery. More ❯
careers through training and the development of new skills and certifications. Overview We are seeking an experienced Service Desk Lead to manage a small team of Enterprise Project Portfolio Management (EPPM) Analysts. The Service Desk Lead will play a pivotal role in ensuring the efficient operation of the Service Desk, meeting service level agreements (SLAs), and driving continuous improvement … This role requires a strong understanding of ITIL processes, leadership skills, and the ability to develop and execute strategies to enhance service delivery. Responsibilities 1. Team Leadership and Line Management Provide day-to-day leadership and management of the Service Desk team of EPPM Analysts. Foster a collaborative and high-performing team environment through coaching, mentoring, and performance … on Service Desk performance metrics, identifying areas for improvement. Input into monthly customer reporting. Act as the escalation point for complex issues and ensure timely resolution. 3. ITIL Process Management Oversee ITIL processes, including Risk Management, Knowledge Management, IncidentManagement, Problem Management, and Change Management. Ensure compliance with ITIL best practices and standards. Develop More ❯
and business expertise. You'll be the go-to expert across a diverse, modern, and complex technology landscape, ensuring seamless support and smooth operations. You'll take charge of incident triage and resolution, lead system upgrades, and keep performance optimized through proactive monitoring and alerting. Beyond day-to-day support, you'll drive continuous improvements in processes and tools … resilience, focusing on high-availability, scalability, and reliability. Support and contribute to the development of runbooks and knowledge sharing materials for the team. Contribute to the ongoing improvement of incidentmanagement processes, response times, and issue resolution. Maintain detailed incident reports, root cause analyses, and post-incident reviews. Document technical processes, troubleshooting procedures, and maintenance guidelines … with a broad range of technologies, including: Practical experience with performance monitoring tools such as Dynatrace or equivalent. Skills & Knowledge Solid understanding of Site Reliability Engineering (SRE) principles, including incidentmanagement, monitoring, alerting, and performance tuning. Strong knowledge of Software Development Lifecycle (SDLC) processes. Familiarity with incidentmanagement platforms like ServiceNow, PagerDuty, or similar tools. Excellent More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
Salt Search
teams for fintech clients. The Escalations Manager will be handling complex Embedded Services client issues that must be resolved in a timely and effective manner. This role requires careful management of issues and diplomatic client communication, helping to maintain high levels of client satisfaction. The Escalations Manager will need to work with account managers to keep them informed of … Escalation Handling: Serve as the primary contact for escalated issues, ensuring swift resolution and clear communication with clients and internal teams Team Leadership: Guide and mentor support engineers, including management of the team's weekly on-call rota and holiday coverage planning to ensure 24/7 coverage. Act as a back-up during unforeseen coverage gaps Process Oversight … like PagerDuty and Salesforce. Ensure compliance with SLAs and contractual obligations Client Assurance: Engage in senior-level conversations with clients to reassure them during service disruptions, incidents and bugs. IncidentManagement: Coordinate with the IncidentManagement team to notify them of incidents and keep the client informed on resolution status. Reporting & Analysis: Track support metrics, resolution More ❯
new function to life. Your expertise in IT operations and observability, combined with your excellent communication skills, will be essential in informing the design of critical capabilities such as incidentmanagement, monitoring, and automation. You will ensure that future state processes are aligned with both upstream and downstream teams across the tech ecosystem, fostering a collaborative and cohesive … automation to enhance the efficiency of IT operations. Monitoring and Maintenance : Utilize AIOps tools to monitor systems, detect anomalies, and predict potential issues before they impact operations. Data Pipeline Management : Develop and maintain ETL pipelines within Cribl to ensure data collection and processing. Collaboration : Work closely with development, operations, and observability teams to integrate AI solutions into the everyday … workflow. IncidentManagement : Employ AIOps for proactive incidentmanagement and resolution. Performance Optimization : Continuously optimize system performance using AI insights and recommendations. Qualifications AIOps Platforms : Familiarity with platforms such as Moogsoft, BigPanda, or IBM Watson AIOps. Programming : Proficiency in languages such as Javascript, Python, Java, and Bash. Cloud Native Expertise : Experience with Docker, Kubernetes, AWS, Google More ❯
Service Delivery Manager has a duty to build and maintain effective stakeholder relationships in a positive way at every level. What will you be doing? ITIL Service Delivery oversight - Incident, Service Request, Problem, Change, Release Management Service Level Management Escalation Management Drive internal and third-party service review meetings covering performance, service improvements, quality and processes … Technology risk assessment and management Technology Service Integration - both internally within Velonetic and with our delivery partners High Priority IncidentManagement & Leadership Ensures the 'Voice of the Customer' is heard and understood, helping to drive the required changes Project Governance input and oversight Operational Readiness Review and sign off External Stakeholder management Fostering a positive working … environment Required Skills Significant experience in IT service management, or a similar role with commensurate level of leadership responsibility Able to manage individuals and teams - including delivery, performance and continuous professional development A solid understanding of ITIL best practices and frameworks - including experience in change/incident/problem management processes. A customer-centric mindset, with excellent More ❯
Join Barclays as a Joint Operations Centre Senior Incident Manager and become a part of the team responsible for monitoring, assessing, and responding to major incidents that have the potential to impact Barclays' operations, services, and people. In this role, you'll lead command and control communications across a wide range of scenarios - from geopolitical unrest and physical security … hours. This is a hybrid role, with a minimum of 3 days per week required in the office. To be successful in this role, you will need the following: Incidentmanagement experience, including leading an incident response team, overseeing shift activity, and guiding team priorities even when acting as an individual contributor. Proficiency in ServiceNow for managing … incidents and operational workflows. Stakeholder management with the ability to manage expectations and build strong working relationships across all levels. Strong leadership skills, including the ability to lead meetings and present confidently to senior executives. Some other highly valued skills may include: Experience using Everbridge for asset location tracking, colleague safety communications, and mass notifications during critical events. Familiarity More ❯
the boundaries? We are seeking a person who is decisive, collaborative and calm under pressure, detail-oriented and analytical, to help us implement and run a new IT risk management framework. This is a multi-faceted role supporting both a Technology Transformation Programme as well as helping to ensure current operational technology and applications are reliable and resilient. This … role will suit an incident or IT disaster recovery manager, or someone with equivalent practical experience in technology operations, who is looking to broaden their skillset. After developing your specialist skills you are now looking for opportunities to grow and learn more about wider resilience, chaos engineering and cloud services - we will support, provide guidance and mentor you. Nevertheless … are creating a new diverse and dynamic team to build innovative ways of building and assessing operationally resilient technology services. Principal Accountabilities: - Business Impact Assessments & Risk Identification: Develop asset management strategies, lead business impact and vulnerability assessments, conduct threat modelling, and maintain risk identification frameworks. - Risk Assessment & Evaluation: Ensure compliance with governance policies, provide expertise on operational resilience, and More ❯
of the fastest-growing infrastructure companies in history, an organization that is in the center of the hurricane being created by the revolution in artificial intelligence. "VAST's data management vision is the future of the market." - Forbes VAST Data is the data platform company for the AI era. We are building the enterprise software infrastructure to capture, catalog … installed clusters. You will work in a 24/7 network operations center-style environment, ensuring the availability, reliability, and security of services. This role involves real-time monitoring, incident detection, incidentmanagement, incident resolution, and clear written and verbal communication with other teams and stakeholders. The Role Monitor clusters using internal monitoring tools to detect … operating procedures (SOPs) and escalation processes. Perform initial investigation and diagnosis of problems, escalating complex issues to support. Document incidents, including their details, troubleshooting steps, and resolutions in the incident tracking system. Collaborate with other teams, including Support, R&D, Account teams, and customers to ensure effective incident resolution and communication. Conduct routine checks and audits to identify More ❯
s remote production evolution towards a hybrid model of fibre, satellite, cloud, and IP. We seek a Dedicated TOC Network Engineer to provide expert technical support, monitoring, and operational management for a major British football league contract, utilizing Cingularity's global network and IP platforms. Based at IMG Studios, Stockley Park, this role is essential for ensuring high-availability … Monitoring Coordinate and execute all technical aspects of live transmissions, including service check-in, configuration validation, active monitoring, and check-out procedures. Perform continuous, expert-level monitoring and proactive management across network, video, and system domains, including: Network: Monitor the Cingularity core network (NetInsight Nimbra DTM & Arista IP) and edge devices using systems like Nimbra Vision and Zabbix. Configure … and talkback, addressing service issues and collaborating with partners and suppliers. • Proactively identify and resolve issues to prevent service impact. • Complete provisioning of new services, including occasional-use services. IncidentManagement & Resolution: Resolve service issues impacting the contract, from investigation to resolution, minimizing impact. Log, analyze, and report incidents, including root cause analysis. Participate in post-incidentMore ❯
across our clinics to further improve our success. The Role: We are seeking an enthusiastic and proactive individual who will be responsible for leading and maintaining the Healthcare Quality Management System (QMS) at the clinics and satellites, and implementing the system to continually improve the quality and effectiveness of the service provided in accordance with the conditions of the … comply with the guidance on good practice as set out in the HFEA's Code of Practice and all relevant clinical and laboratory directives. The role includes responsibility for management of all incidents, non-conformances and complaints, user satisfaction reviews, internal and external auditing, monitoring, evaluation and continual improvement. As well as management of Quality for TFP Boston … within the clinic and its satellites • Maintaining the HFEA licence • Developing and monitoring the quality policy, quality objectives and quality indicators • Internal and external auditing, user satisfaction and complaints management • Incidentmanagement: investigation, corrective and preventative actions • Training, communicating, monitoring and motivating all staff to support the QMS Qualifications and Training: • Degree or equivalent post graduate diploma More ❯
Datalake Service and the Core Banking System. You will oversee their deployment, development, enhancements and production operations. You should have extensive experience and skills in Software Engineering, DevOps, project management, incidentmanagement, team management and stakeholder management. Additionally, you should have expertise in: Multi-threaded Java backend development API and SQL database development Typescript and Node.js More ❯
role in maintaining the integrity of our IT systems, collaborating closely with cross-functional teams, and ensuring our digital operations meet the highest standards. Key Responsibilities: Team Leadership and Management: Lead, mentor, and manage a diverse team of IT professionals including an Application Support Specialist, Technical Project Manager, Cyber Security and Compliance Analyst, and End-to-End QA Specialist. … with the Application Support Manager to develop and implement effective support processes and documentation. Ensure all applications are updated, patched, and maintained in line with best practices. Technical Project Management: Oversee the planning, execution, and delivery of technical projects. Work closely with the Technical Project Manager to ensure projects are completed on time, within scope, and budget. Facilitate communication … delivery. Identify opportunities for process optimisation and efficiency gains. Stakeholder Communication: Act as the primary point of contact for all digital operations-related matters. Provide regular updates to senior management on the status of projects, operational performance, and security compliance. Facilitate effective communication between IT teams and business units. Problem Solving and IncidentManagement: Manage and resolve More ❯