in automating/scripting to remove toil. A bility to quickly understand, update and write code in languages such as Python, Java, Golang BASH and PowerShell; Working experience monitoring SLO's, SLI's and SLAs and logging updates and alerting where appropriate; S trong DevOps understanding and familiarity, including experience of Infrastructure as Code and CI/CD pipelines, such More ❯
in automating/scripting to remove toil. A bility to quickly understand, update and write code in languages such as Python, Java, Golang BASH and PowerShell; Working experience monitoring SLO's, SLI's and SLAs and logging updates and alerting where appropriate; S trong DevOps understanding and familiarity, including experience of Infrastructure as Code and CI/CD pipelines, such More ❯
App Gateway. 2+ years of experience with Reliability concepts to ensure high performance and high service availability, able to define implement and improve business performance SLO's. 2+ years of experience with Production operations including 24x7 on-call support, escalation/paging with OpsGenie, incident management, RCA (Root Cause Analysis) and retrospective analysis. 2+ or more … enhance the Observability and Reliability of applications and services running on IaaS and PaaS in Microsoft Azure. AWS and GCP are nice to have. ServiceLevelObjectives and indicators focused on improving business workflow performance and availability. Technical and business dashboards, metrics, and actionable alerting. Processes and automation for increasing uptime More ❯
the SRE organisation as a key member of the SRE leadership team. Lead the definition and track ServiceLevelObjectives (SLO) to measure service availability in combination with service, product and engineering communities. Collaborate with product and engineering senior managers to ensure delivery More ❯
and principles to strengthen focus, behaviours, and culture You will support the POs and ELs to ensure our products and services are sufficiently resilient and to address any SLA, SLO failures or increased incident levels SREs are contributing to the Product Engineering backlog with a focus on reliability and performance, you will work with Product Owners and Engineering Leads using … lifecycle and experience in end-to-end delivery of software products, with emphasis on operational aspects The ability to define, implement and achieve relevant ServiceLevelObjectives for the products in the lab you are aligned to Experience with agile development methods (Scrum, Kanban) and tooling (Jira and Confluence) and experience More ❯
include: To form part of a critical operations function that is responsible for the monitoring, availability and performance of production services. Responding to stakeholder requests within agreed timescales or SLO Drive automation to reduce failures, manual tasks and therefore improving overall application performance and availability. Perform systems administration activities to ensure the smooth operation of applications across multiple platforms Coordinate More ❯
Manchester, England, United Kingdom Hybrid / WFH Options
bet365
and management of effective ServiceLevel Indicators (SLI) and ServiceLevelObjectives (SLO) for reliability and customer satisfaction. Knowledge of contemporary observability tools, techniques and best practice including Splunk, New Relic, Grafana and Pager Duty. Excellent knowledge of programming languages including Python, Golang More ❯
Stoke-on-Trent, England, United Kingdom Hybrid / WFH Options
bet365
and management of effective ServiceLevel Indicators (SLI) and ServiceLevelObjectives (SLO) for reliability and customer satisfaction. Knowledge of contemporary observability tools, techniques and best practice including Splunk, New Relic, Grafana and Pager Duty. Excellent knowledge of programming languages including Python, Golang More ❯
and team talks , helping them improve their C#/.NET Core skills. Support and enhance current systems and initiatives during office hours, ensuring that servicelevelobjectives are met. Maintain a strong focus on quality, reusability, clean architectures, security, and resilience across the full application lifecycle. Collaborate with the Lead Developer More ❯
billion events per day. To ensure the reliability of this environment for our customers, SREs work closely with developers and product managers to understand servicelevelobjectives, think through failures scenarios, and design systems which balance cost with reliability objectives. Additionally, SREs collaborate with the Information Security team to ensure that More ❯
and management of effective ServiceLevel Indicators (SLI) and ServiceLevelObjectives (SLO) for reliability and customer satisfaction. Knowledge of contemporary observability tools, techniques and best practice including Splunk, New Relic, Grafana and Pager Duty. Knowledge and experience of modern software development techniques More ❯
Cardiff, South Glamorgan, United Kingdom Hybrid / WFH Options
RVU Co UK
s perspective by sharing your experience, knowledge & expertise in a continuous learning environment. As a member of the platform engineering team you will be accountable for the following: Objective setting, feature ideation, development and measurement Architectural decisions and designs of the platform, domains and systems Defining, evolving, and applying team processes Building efficient CI/CD pipelines and … well architected principles Solid understanding of platform and reliability engineering approaches (SRE), including observability, performance optimisation, capturing analytics and security best practices Experience implementing ServiceLevelObjectives and using them to drive error budgets, risk management and alerting Knowledge and experience with operating containers at scale within the Kubernetes ecosystem Experience More ❯
and more. Lead incident management, capacity planning, and performance tuning initiatives. Guide engineers in observability, cost optimisation, and security best practices. Define and track servicelevelobjectives (SLOs) to improve engineering outcomes. Champion a DevOps mindset with “you build it, you run it” accountability. We’re Looking For: Proven background in More ❯
Quality, Stability & Standards: Establish quality standards to meet performance, reliability, and maintainability of the systems. With a strong production-first mindset, drive observability, maintain ServiceLevelObjectives (SLOs), and ensure efficient incident resolution. Oversee the maintenance of existing systems, ensuring continuous improvements and prompt resolution of issues. Agile Delivery & Collaboration: Working More ❯
Northern Ireland, United Kingdom Hybrid / WFH Options
Jobgether
Terraform or Pulumi, and observability tools such as Datadog or CloudWatch. Experience in implementing AI-powered tools for workflow optimization and operational improvements. Proven success in setting up scalable, SLO-driven monitoring strategies in 24/7 environments. Ability to manage distributed teams, foster innovation, and drive results in a collaborative, inclusive setting. Strong communication and mentorship skills, with a More ❯
services and components meet and continue to meet all their agreed performance targets and service levels. • Investigate all breaches of servicelevelobjectives, initiating remedial activities where possible. • Use application management software and tools to investigate issues, collect performance statistics and create reports. • Continuously suggest and contribute More ❯
initiatives from design through deployment and operations Write maintainable, well-tested, high-quality code and uphold engineering best practices Focus on observability and maintain ServiceLevelObjectives, take operational responsibility for the Identity Platform, including joining the on-call rota Foster a strong engineering culture through mentorship, code reviews, and collaboration More ❯
and Responsibilities: Being part of a critical operations function that is responsible for the monitoring, availability and performance of production services Responding to stakeholder requests within agreed timescales or SLO Drive automation to reduce failures, manual tasks and therefore improving overall application performance and availability Perform systems administration activities to ensure the smooth operation of applications across multiple platforms Coordinate More ❯
Newcastle Upon Tyne, Tyne and Wear, North East, United Kingdom Hybrid / WFH Options
Develop
platform's core value streams. Key Responsibilities Technical Leadership & Strategy Champion engineering best practices, system reliability, and architectural integrity Define and track progress toward ServiceLevelObjectives (SLOs) Collaborate with product stakeholders to shape robust and scalable solutions Take responsibility for non-functional areas such as performance, maintainability, and security Provide More ❯
you'll be doing Technical Leadership & Strategy Champion technical quality, system health, and architectural integrity across your value stream Define and drive progress towards ServiceLevelObjectives (SLOs) in collaboration with Principal Engineers Work closely with Product Owners and Product Managers to design scalable, high-performing technical solutions that align with More ❯
to meet business needs. Champion the non-functional qualities of our data products-supportability, testability, security, compliance, maintainability, and performance. Drive progress towards our ServiceLevelObjectives (SLOs), ensuring our systems are reliable and resilient. Partner closely with Principal Engineers and technical architects to define and design data solutions aligned with More ❯
establishment of reliability goals , including ServiceLevel Indicators (SLI), ServiceLevelObjectives (SLO), and error budgets , and define technical solutions for measurement. Collaborate on Business-Critical Systems Design with an emphasis on reliability, supportability, and risk mitigation . Assist in developing capacity planning More ❯
feature ideas and betterments through tight cooperation inside and outside of your immediate team. Build trust and reliability in your products, review performance against servicelevelobjectives, address incidents and prioritize improvements. Qualifications: Not all applications will have skills that match a job description exactly. Ciptex values diverse experiences in other More ❯
About the role We have an exciting opportunity for a Java Engineer to join the elementsuite team, with a passion for clean code, elegant architecture, and efficient delivery. This is a hands-on role where you'll be developing the More ❯