thinking and teamwork are essential as you minimize disruption and help the bank return to normal operations smoothly. What you'll need Experience with Google Cloud Platform (GCP) Infrastructure Management Proficiency in designing, deploying, and managing multi-tenant GKE clusters, including control plane operations and node pool optimizations Hands-on experience with GKE configurations such as autoscaling, multi-zone …/regional clusters, and workload optimization for tenant-specific use cases Communication Skills: Clear verbal and written communication to interact with senior management, colleagues and support teams. IncidentManagement and Troubleshooting: Demonstrated expertise in resolving infrastructure-related incidents, addressing Kubernetes cluster issues, optimizing Istio configurations, conducting impact analysis, and executing root cause resolution Proficient in troubleshooting Istio … permanent solutions to prevent recurrence Skilled in coordinating with cross-functional teams during incidents to ensure rapid resolution while minimizing downtime and tenant impact Expertise in documenting and refining incidentmanagement workflows to enhance organizational response times and resiliency An understanding of Terraform state management, workspaces, and CI/CD pipeline integration for automated infrastructure deployment This More ❯
customer, project, and support teams. • Manage escalations and Major Incidents. • Work with account teams on contractual negotiations for renewals. • Report on client profit and loss KPIs. • Ownership of Service Management processes and their documentation in IT Glue. • Ownership of the transition process of Managed Service customers from pre-sales to operational support. • Capacity planning of support teams to ensure … performance and customer satisfaction. • Create customer excitement. Stimulate customer demand by ensuring service roadmaps continue to evolve (work with operational teams to support you). • Develop an IT Service Management Plan (ITIL process delivery) to improve service consistency and reliability. Owning and delivering key processes including change, problem, and major incidentmanagement Skills/Experience: Experience of … Project and Operational Incident, problem, and change management within a complex environment. Strong partner management expertise, with demonstrable experience of navigating complex multi-vendor environments. Exceptional stakeholder engagement, influencing, and communication skills. Ability to work under pressure, multitask, and handle multiple assignments simultaneously. Excellent time-management and problem-solving skills. Strong technical background. Minimum of More ❯
an award-winning software scale-up with big ambitions and the momentum to match. Trusted by Big Four and many other top professional services firmsglobally, our AI-powered resource management platform is helping organisations to achieve extraordinary results. Our platform stands apart as the only solution that combinesadvanced AI, real-time project financials, and firm-wide insights to elevate … resource management to a strategic function. By driving profitable growth,powering confident decisions,and ensuring satisfied clients andteams-we're helping our customers build strong organisations and careers for the long term. Why our customers love Dayshape: We help professional firms optimise margins and increase revenue, unlocking access to more profitable work. We provide complete operational visibility today and … and ambitious team-driven by our values and a shared commitment to success. If you're ready to join a fast-growing, high-impact company that's reimagining resource management, then let's talk. About the role As we grow and more customers are adopting our platform, we are scaling up to meet the demand. This means more automation More ❯
You'll help address and restore live service on our Pega Cloud estate, spot trends and repeat incidents, taking actions to remove inefficiencies within our Pega applications to prevent incident recurrence and optimise the application experience for our colleagues. Experience of PDC is a must, along with alerting toolsets such as Nexthink and Dynatrace. What you'll do Troubleshoot … and restores live Pega incidents. Works with TRMs and IncidentManagement on fast-paced, high severity incidents. Identifies root cause and actively owns problem records through to resolution for a permanent fix to prevent incident recurrence. Reacts accordingly to urgent production issues and proactively engages with TRMs and other Pega Run Engineers to ensure quality of service. … on our Pega Cloud estate to drive quality of production implementations and to ensure guardrails are met. Delivers prescribed outcomes for area of responsibility by working within established knowledge management systems. Suggests improvements and crafts a plan for a small part of a change management program with input from technical experts. Why Lloyds Banking Group We're on More ❯
and are expanding out into other AWS products such as ECS Fargate . Our IAC is a mix of Serverless framework and Terraform . We use JIRA for project management, Github actions for our CI/CD pipelines and Incident.io for our incidentmanagement process. For more detailed information, feel free to ask for our tech radar More ❯
OT) environments. Position Responsible for detecting, analysing and responding to security incidents through to resolution. Providing support on baseline security analysis in OT projects. Manage operational components and coordinate incidentmanagement, including detection, response, reporting and liaising internally and externally. Review audit trails, system logs and other monitoring data sources periodically and ensure that they are in compliance More ❯
the continuous improvement of everything we do, both team process and what we deliver Regular patching, updates and maintaining secure systems Participating in out-of-hours, in-hours and incidentmanagement processes Utilising FinOps practices in all you do What we're looking for: OKD or Openshift, or general Kubernetes administration experience Experienced with infrastructure as code with More ❯
technical stakeholders. A strong sense of ownership and pride in the work you deliver. Nice to have skills: Experience with legacy system refactoring or performance tuning. Previous involvement in incidentmanagement or system debugging at scale. An understanding of secure development practices and compliance requirements. Passion for improving engineering culture through collaboration, feedback, and knowledge sharing. Why you More ❯