address pain points and ensure smooth change and release transitions. Vendor Oversight: Manage third-party vendor performance and ensure accountability for service delivery. Service Monitoring & Metrics: Define and track KPIs, dashboards, and metrics to measure support quality and team performance. ProactiveMonitoring: Drive proactive detection and … P2 incidents , business impact analysis, root cause investigations, and change coordination. Strong grasp of IT service management practices; ITIL v4 certification or equivalent preferred. ProactiveMonitoring : Hands-on experience with tools like Dynatrace , Azure Application Insights , or similar platforms. Ability to use monitoring data to enhance application More ❯
incident detection, notifications, triage, and resolution. Key Responsibilities: Pipeline Approach: Adopt a pipeline approach to enable observability of services deployed across multiple environments, balancing monitoring, logging, and tracing based on service classification. Intelligent Alerts: Design and build intelligent alerts using pipelines, onboarding automated runbooks triggered with clear audit/… logs in service management tools like Jira Service Management. Dashboards: Create and maintain dashboards for proactivemonitoring of services to help teams resolve incidents quickly. Monitoring Capability: Continuously improve monitoring capabilities to identify key alerts and thresholds for early warnings before services fail. Automation: Enable intelligent … and commercial observability tools (e.g., Prometheus, Grafana, NewRelic). Expertise in cloud environments (e.g., AWS, Azure) and infrastructure as code (IaC) tools like Terraform. Monitoring and Observability: Experience in creating and maintaining dashboards for proactivemonitoring of services. Ability to design and build intelligent alerts using pipelines More ❯
Group at: How will you make an impact? The Security Operations Specialist acts as an important part of the organization's cybersecurity posture, driving proactivemonitoring, incident escalations and collaborating with stakeholders to safeguard company assets and data. Responsibilities include, but are not limited to, proactivemonitoring … Security Awareness & Training Manager on some occasions with the testing and improvement of our training programme. Key elements to the role include: Champion our monitoring and measurement program through regular audits and reporting. Prepare monthly and quarterly reports for key stakeholders. Monitor and review security incidents to identify trends … follow up on actions when necessary. Collaborate with security engineers and other key stakeholders to ensure the successful implementation of security projects. Conduct domain monitoring for typo squatting and initiate takedowns. Support the Security Awareness & Training Manager when required. The skills we would love to see in your suitcase More ❯
expansion of Incident Detection and Response (IDR) and Countdown Premium (CDP) for customers with non-media streaming workloads, helping them strengthen their incident management, proactivemonitoring, and operational resilience.You will engage customers to understand their support requirements, identify obstacles to adoption, and refine AWS's Premium Support value … needs with tailored solutions. • Solution Adoption & Expansion: • Lead the enablement, execution, and GTM strategy for MSS adoption among media streaming customers, ensuring optimized support, proactivemonitoring, and operational guidance. • Drive IDR and CDP adoption for non-media streaming customers, enhancing incident detection, response readiness, and premium support experiences. More ❯
Bradford, Yorkshire, United Kingdom Hybrid / WFH Options
Freemans Grattan Holdings (fgh)
E-commerce DevOps Engineer role is responsible for managing and optimising software deployment processes for E-Commerce B2C websites and shopping Apps and proactively monitoring and reporting E-Commerce application and infrastructure performance. The role involves: Working collaboratively with software architects, software engineers and network, infrastructure and operations teams … to ensure smooth deployment, scalability and security of E-Commerce B2C websites and shopping apps using CI/CD pipelines and performance monitoring tools. Monitoring E-Commerce system performance, optimizing caching, ensuring uptime and responding to incidents. WHAT YOU'LL BE DOING Further developing and managing CI/… CD pipelines to automate deployment and reduce release cycle times. Ensuring website availability, performance and security through proactivemonitoring and incident response and implementing website performance monitoring and optimisation strategies to improve page load times, identify, diagnose and resolve issues and enhance customer experience. Enhancing system observability More ❯
Follow ITIL-aligned processes for escalation and management of incidents. Participate in an On-Call Rota for out-of-hours incident response. System Maintenance & Monitoring Perform regular system health checks on client infrastructure, including servers, networks, and backups. Implement preventive maintenance plans and updates to minimise downtime. Proactively monitor More ❯
Follow ITIL-aligned processes for escalation and management of incidents. Participate in an On-Call Rota for out-of-hours incident response. System Maintenance & Monitoring Perform regular system health checks on client infrastructure, including servers, networks, and backups. Implement preventive maintenance plans and updates to minimise downtime. Proactively monitor More ❯
Chorley, Lancashire, North West, United Kingdom Hybrid / WFH Options
Nextech Group Limited
Follow ITIL-aligned processes for escalation and management of incidents. Participate in an On-Call Rota for out-of-hours incident response. System Maintenance & Monitoring Perform regular system health checks on client infrastructure, including servers, networks, and backups. Implement preventive maintenance plans and updates to minimise downtime. Proactively monitor More ❯
Asset Management (stock level check, tracking, receiving, preparing and shipment of assets To provide VIP support as required, includes expediated end user device troubleshooting, proactive support, proactivemonitoring and health checks, targeted training on new tools for executive, custom onboarding process for executive etc. Stay updated with More ❯
Asset Management (stock level check, tracking, receiving, preparing and shipment of assets To provide VIP support as required, includes expediated end user device troubleshooting, proactive support, proactivemonitoring and health checks, targeted training on new tools for executive, custom onboarding process for executive etc. Stay updated with More ❯
Asset Management (stock level check, tracking, receiving, preparing and shipment of assets To provide VIP support as required, includes expediated end user device troubleshooting, proactive support, proactivemonitoring and health checks, targeted training on new tools for executive, custom onboarding process for executive etc. Stay updated with More ❯
watford, hertfordshire, east anglia, United Kingdom
Cognizant
Asset Management (stock level check, tracking, receiving, preparing and shipment of assets To provide VIP support as required, includes expediated end user device troubleshooting, proactive support, proactivemonitoring and health checks, targeted training on new tools for executive, custom onboarding process for executive etc. Stay updated with More ❯
to escalation manager for problems affecting multiple service teams. Strong problem-solving and analytical skills, with the ability to make decisions under tight SLAs. Monitoring: Monitor system performance and capacity, taking corrective actions as needed. Expert Coordination: Provide expert advice and coordinate on IM&T network and systems to … contact for a broad range of technologies across Infrastructure, Networking, and Cloud. Ensure the uptime of cloud-based systems, Servers, and web apps through proactivemonitoring and maintenance. Infrastructure Management: Experience in overseeing the management of internal and hosted software platforms, Virtualised environment Azure, Entra ID, M365, and More ❯
What you’ll be doing: Public Cloud Infrastructure Management which involves provisioning, configuration and maintaining various Cloud resources to ensure scalability, reliability and security. Monitoring and Performance Optimisation by implementing monitoring solutions to track performance and identify areas for optimisation to enhance user experience and automate improvements where … possible. System Availability and Reliability by ensuring high availability and data integrity through proactivemonitoring, alerting, backups and DR planning and testing. Continuous Improvement by staying updated with the latest Cloud & Infrastructure technologies and continuously evaluating and proposing enhancements to existing systems, services and processes whilst also ensuring More ❯
What you’ll be doing: Public Cloud Infrastructure Management which involves provisioning, configuration and maintaining various Cloud resources to ensure scalability, reliability and security. Monitoring and Performance Optimisation by implementing monitoring solutions to track performance and identify areas for optimisation to enhance user experience and automate improvements where … possible. System Availability and Reliability by ensuring high availability and data integrity through proactivemonitoring, alerting, backups and DR planning and testing. Continuous Improvement by staying updated with the latest Cloud & Infrastructure technologies and continuously evaluating and proposing enhancements to existing systems, services and processes whilst also ensuring More ❯
What you’ll be doing: Public Cloud Infrastructure Management which involves provisioning, configuration and maintaining various Cloud resources to ensure scalability, reliability and security. Monitoring and Performance Optimisation by implementing monitoring solutions to track performance and identify areas for optimisation to enhance user experience and automate improvements where … possible. System Availability and Reliability by ensuring high availability and data integrity through proactivemonitoring, alerting, backups and DR planning and testing. Continuous Improvement by staying updated with the latest Cloud & Infrastructure technologies and continuously evaluating and proposing enhancements to existing systems, services and processes whilst also ensuring More ❯
Level Agreements for fault resolutions and service requests completions. Provide customer service to internal and external customers to ensure a consistent experience. Adopt a proactive approach towards all client activities. Day to day incident management and proactivemonitoring of IT Security Systems and associated platforms and components … networking hardware and software products. Support end user workstation hardware, software, networked peripheral devices, cabling, and networking hardware and software products by testing, maintaining, monitoring, and troubleshooting in order to determine source of computer problems (hardware, software, user access, etc.) Conduct updates of technical documents and knowledge base to … where any additional hardware or software is included within the network component inventory. Prepare, maintain, and adhere to procedures for logging, reporting, and statistically monitoring network data as directed. Adhere to business continuity and disaster recovery plans, and maintain current knowledge of plan executables. Respond to emergency network outages More ❯
Level Agreements for fault resolutions and service requests completions. Provide customer service to internal and external customers to ensure a consistent experience. Adopt a proactive approach towards all client activities. Day to day incident management and proactivemonitoring of IT Security Systems and associated platforms and components … networking hardware and software products. Support end user workstation hardware, software, networked peripheral devices, cabling, and networking hardware and software products by testing, maintaining, monitoring, and troubleshooting in order to determine source of computer problems (hardware, software, user access, etc.) Conduct updates of technical documents and knowledge base to … where any additional hardware or software is included within the network component inventory. Prepare, maintain, and adhere to procedures for logging, reporting, and statistically monitoring network data as directed. Adhere to business continuity and disaster recovery plans, and maintain current knowledge of plan executables. Respond to emergency network outages More ❯
date Knowledge Base Articles. Ensures Production infrastructure is up to date with server/security patches and certificates. Continuous improvement of system and application monitoring and automation Identify and automate manual workarounds and process improvements Proactivemonitoring of Monitor the availability, latency, scalability and efficiency of all More ❯
and diagrams. Keep up-to-date documentation of all servers, racks, systems, and configurations Mentor and provide guidance to Infrastructure and DevOps Engineers Ongoing proactive maintenance of Server and Network systems Research and development of new ideas and technologies to improve cloud Infrastructure Provide independent reviews of Patch, Anti … malware, and Security systems Proactivemonitoring of Network, Server, and Storage systems using group standard tools Responding to notifications/alerts for failed hardware/software, and assign to appropriate Infrastructure Engineer as required Assisting in capacity planning and monitoring of storage and systems at all times More ❯
Basingstoke, Hampshire, United Kingdom Hybrid / WFH Options
Nomios UK&I Limited
play a pivotal role in our customer support processes, ensuring the seamless operation of UK-based ISP and Enterprise networks. Your responsibilities will include proactivemonitoring, maintenance, and troubleshooting to deliver optimal performance and reliability. You will be integral to providing 24/7 coverage and support to … quo. A love for learning and obtaining certifications, coupled with an entrepreneurial mindset, will set you apart. We're looking for someone who is proactive, eager to grow, and ready to mentor others while contributing to a dynamic and supportive team. Responsibilities Key responsibilities of the role include: Network … Monitoring & Incident Management: Monitor Nomios customer network infrastructures (routers, switches, firewalls, servers) using various NOC tools Identify, troubleshoot, and resolve network issues affecting service availability, performance, and reliability Respond to alerts and notifications to ensure incidents are resolved promptly within defined SLAs Escalate unresolved issues to the appropriate technical More ❯
critical role in place, we anticipate enhanced integration of Generative AI deployments, consistent AI performance, and the unlocking of transformative AI-driven initiatives. This proactive approach will empower Mars to scale digital experiences with the trust and agility that our stakeholders expect, positioning us at the forefront of innovation … frameworks such as TensorFlow, PyTorch, LangChain, or similar technologies Demonstrated ability to lead cross-functional teams and operate within complex enterprise ecosystems Familiarity with monitoring, observability, and platform telemetry tools (e.g., Prometheus, Grafana, Azure Monitor) Exceptional communication and stakeholder engagement skills to partner with business, technical, and governance teams … to enhance platform reliability and resilience, proactively addressing potential challenges. LLMOps Implementation Develop and operationalize Large Language Model Operations (LLMOps) practices, encompassing model deployment, monitoring, versioning, rollback, and performance tuning at scale. Ensure efficient management of AI models to maximize their effectiveness and business impact. Service Management & Support Establish More ❯
critical role in place, we anticipate enhanced integration of Generative AI deployments, consistent AI performance, and the unlocking of transformative AI-driven initiatives. This proactive approach will empower Mars to scale digital experiences with the trust and agility that our stakeholders expect, positioning us at the forefront of innovation … frameworks such as TensorFlow, PyTorch, LangChain, or similar technologies Demonstrated ability to lead cross-functional teams and operate within complex enterprise ecosystems Familiarity with monitoring, observability, and platform telemetry tools (e.g., Prometheus, Grafana, Azure Monitor) Exceptional communication and stakeholder engagement skills to partner with business, technical, and governance teams … to enhance platform reliability and resilience, proactively addressing potential challenges. LLMOps Implementation Develop and operationalize Large Language Model Operations (LLMOps) practices, encompassing model deployment, monitoring, versioning, rollback, and performance tuning at scale. Ensure efficient management of AI models to maximize their effectiveness and business impact. Service Management & Support Establish More ❯
critical role in place, we anticipate enhanced integration of Generative AI deployments, consistent AI performance, and the unlocking of transformative AI-driven initiatives. This proactive approach will empower Mars to scale digital experiences with the trust and agility that our stakeholders expect, positioning us at the forefront of innovation … frameworks such as TensorFlow, PyTorch, LangChain, or similar technologies Demonstrated ability to lead cross-functional teams and operate within complex enterprise ecosystems Familiarity with monitoring, observability, and platform telemetry tools (e.g., Prometheus, Grafana, Azure Monitor) Exceptional communication and stakeholder engagement skills to partner with business, technical, and governance teams … to enhance platform reliability and resilience, proactively addressing potential challenges. LLMOps Implementation Develop and operationalize Large Language Model Operations (LLMOps) practices, encompassing model deployment, monitoring, versioning, rollback, and performance tuning at scale. Ensure efficient management of AI models to maximize their effectiveness and business impact. Service Management & Support Establish More ❯
might arise. Liaise with third party partners, suppliers and other parties when required. Maintain the security, integrity and performance of our systems through regular, proactivemonitoring and housekeeping. Keep colleagues informed regarding any issues which arise, take remedial action where necessary, using available tools where applicable. SKILLS, KNOWLEDGE More ❯