Monitoring Manager

We have excellent contract job opportunity for Observability/Monitoring Service Owner – Cloud for our leading client.

Role overview

Own the technical execution of the Observability solutions, integration of monitoring tools, leveraging the ai capabilities in the NOW platform to manage events of client’s Transform products and technical platforms.

Contract – 6 months (high potential to extend further)

Location – Waterside (UB7 0GB) (2-3 days per week onsite)

Pay – attractive daily rate (inside IR35)

In this role, you will...

Leadership and Governance:

  • Lead and own the IT observability, Automation and Autohealing services for IAG Transform.
  • Foster a culture of innovation, collaboration, and continuous improvement in the organisation.
  • Develop and implement policies, process and procedures for observability service.
  • Define standards for logs, event alerts and quality assurance.
  • Establish governance frameworks to ensure consistent and compliant usage of observability tools.
  • Set up technical review gov board for any monitoring solutions to define/validate/endorse monitoring strategies, solutions, demands, etc.
  • Conduct regular audits to ensure compliance with established policies and standards.
  • Responsible for providing an observability centre of excellence, own and provide observability solutions to product and platform teams.

Innovation and Strategy:

  • Develop strategies to leverage new observability tools and technologies to enhance IT service operations and overall business operations.
  • Lead proof-of-concept initiatives to automation resolution of events and incidents.
  • Introduce and implement new machine learning models and aiops features.

Process Improvement:

  • Responsible to identify service optimisation initiatives to mature the overall service.
  • Continuously improve IT and business service availability through effective use of observability and automation tooling.
  • Identify opportunities to automate processes and reduce manual efforts.
  • Optimise metric intelligence.

Vendor Management:

  • Manage vendors and partners to provide best in class service to meet IAG requirements.
  • Manage vendor relationships, service-level agreements (SLAs), escalations and CSI plans.
  • Evaluate and select new vendors and tools as needed.

Observability Tooling Architecture:

  • Design and oversee the implementation of a comprehensive enterprise observability tooling architecture and strategy that supports ITSM, monitoring, observability, automation, and delivery management.
  • Engage in AiOps project to ensure that the key monitoring tools like Datadog, AWS, Azure monitor, Dynatrace, etc is feeding the right logs and metrics into event management module in service now.
  • Optimize observability tooling infrastructure to improve efficiency, reliability, and performance.
  • Ensure that all tools integrate seamlessly with each other and with other enterprise systems.
  • Develop and maintain a roadmap for enterprise tool enhancements and upgrades.
  • Set up business service monitoring dashboards for the critical business services

Automation & Autohealing:

  • Own the automation and autohealing service, platforms and tools.
  • Define the automation and autohealing policy, process and procedure.
  • Identify potential use cases for automation and autohealing and take it through the right governance to implement automation playbooks using ansible or any AWS/AZURE native services that seem fit for the use case.
  • Responsible for reduction in manual efforts in service ops and increase in automation.

Tool Integration and Optimization:

  • Work collaboratively with cross-functional teams to ensure integration of tools across the Enterprise to reduce manual effort and maximise quality and productivity.
  • Define the technical specifications, standards, and policy for technical integration of monitoring tools into ServiceNow/Ansible.
  • Validate the technical architecture of the integration to ensure its fit for use, fit for purpose, its scalable and flexible to meet the demands of measuring business services.
  • Implement best practices, industry standards and frameworks for configuration and usage of observability and automation technology tools.

ITSM Tooling:

  • Responsible to identify opportunities to increase the proactive prediction, detection and restoration of events and incidents using machine learning models.
  • Responsible to leverage the aiops, service now to increase the automation of resolution.
  • Design and oversee the implementation of ITSM tooling solutions that support ITIL-aligned processes.
  • Work collaboratively with cross-functional teams to ensure integration of ITSM tools with other essential enterprise tools (e.g., monitoring, CMDB, service desk, automation tools).

Training and Support:

  • Provide training and support to technology staff on the effective use of observability and automation services.
  • Serve as a subject matter expert for enterprise tools and related technologies.

Skills

Minimum Requirements:

  • Extensive experience (typically 15+ years) in observability and automation technology, tools, service, process with a strong focus on management, effectiveness and architecture.
  • Significant experience in observability and automation architecture and enterprise systems.
  • Proven expertise in designing, implementing, and managing a variety of observability tools such as Cloudwatch, Azure monitor, Datadog, ThousandEyes, etc.
  • Proven expertise in designing, implementing and managing a variety of automation and autohealing tools such as ansible, NextThink and native AWS/Azure services.
  • Experience of integrating with other industry tooling such as ServiceNow, Ansible, Next Think, GitHub, and other DevOps tooling.
  • Experience with industry standard SDLCs including but not limited to Agile, Waterfall, Hybrid, product operating model, etc.
  • Demonstrated ability to integrate and optimize observability tooling across complex IT environments in cloud hosting specifically AWS & Azure, On-Prem and SaaS. Preferred experience with range of cloud native solutions ie Kubernetes monitoring.
  • Experience and knowledge of Service now platform, preferably ITSM/ITOM/AIOps including Metrics intelligence.
  • Experience in defining and owning the event management and automation process.
  • Strong understanding of event correlation, noise reduction techniques which will complement the AIOps, automation and autohealing capabilities.
  • Experience with defining observability, automation and autohealing strategies to increase the coverage and adoption across the landscape.
  • Experience embedding observability, automation and autohealing practices into BAU operations through service design and service operations processes.

Additional/Desirable:

  • Strong understanding of ITSM best practices and principles; ITIL certification is preferred.
  • Experience of managing and working with vendors/offshore teams to support day to day activities.
  • Ability to lead and motivate cross-functional teams and the skill to influence stakeholder decision-making with data-driven insights.
  • Flexibility to adapt to changing organizational requirements, technologies, and methodologies.
  • Excellent communication skills with a capacity to present, discuss and disseminate technical concepts in a business language to multiple audiences.
  • Deep understanding of service value for customers and driving continuous improvement through automation.
  • Initiates and manages change to help shape the future direction of client's Tech and organisation.
  • Collaborative, open working, resulting in recognition as a valued partner by colleagues, customers and senior stakeholders across the organisation.
  • Clear decision-making ability with the facility to judge complex situations and assess when to escalate issues.
  • Demonstrates a passion for and commitment to continuous personal professional development.

Critical Skills:

  • Excellent analytical, problem-solving, and strategic thinking skills.
  • Strong communication and interpersonal skills with the ability to work effectively with cross-functional teams.
  • Exceptional organizational, communication, and interpersonal skills specific to a fast-paced, global corporate environment.
  • Robust problem-solving and analytical capabilities.
  • Experience in vendor management and negotiation.
  • Excellent verbal and written communication skills to effectively convey change proposals, document architecture and processes and liaise with stakeholders at all levels.
  • Meticulous attention to detail to ensure accuracy and thoroughness.

Job Details

Company
Hays
Location
Waterside, England, United Kingdom
Posted