development lifecycle to ensure reliability, scalability, and operational stability are maintained across all supported platforms.* Define, create, and monitor application analytics to support improved servicelevelobjectives and drive data-informed decision making.* Ensure strict adherence to change management release processes while accelerating automation initiatives for these workflows.* Lead resiliency management … e.g., RDS/Aurora) and non-relational databases equips you to support diverse data storage requirements.* Previous exposure to site reliability engineering concepts-including servicelevelobjectives (SLOs), servicelevel agreements (SLAs), servicelevel indicators More ❯
Bristol, Avon, England, United Kingdom Hybrid / WFH Options
Robert Walters
to manage Kubernetes clusters in production environments Competence in scripting and development using languages such as Python, Java, Go, Bash, or PowerShell Strong understanding of service-levelobjectives (SLOs), indicators (SLIs), and monitoring practices Hands-on experience with infrastructure as code (e.g., Terraform) and CI/CD tools (e.g., Jenkins, Azure More ❯
debugging Python code One or more IaC toolset proficiency e.g. Pulumi or Terraform. Designed and built infrastructure using Azure which takes into consideration: observability, alerting, uptime SLA's and SLO's and Azure DevOps pipelines. Be able to collaborate well with both engineering teams and colleagues in customer-facing teams. Be an excellent communicator both in written and verbal forms. More ❯
Bristol, Gloucestershire, United Kingdom Hybrid / WFH Options
TwinStream Limited
consistent and correctly configured. The system is designed to be highly observable and available. The team will use monitoring tools to verify that all components are meeting SLA/SLO requirements. If any problems are identified, the team will take preventive actions to minimise customer impact and restore service as quickly as possible. This role is More ❯
Bristol, Gloucestershire, United Kingdom Hybrid / WFH Options
TwinStream
consistent and correctly configured. The system is designed to be highly observable and available. The team will use monitoring tools to verify that all components are meeting SLA/SLO requirements. If any problems are identified, the team will take preventive actions to minimise customer impact and restore service as quickly as possible. This role is More ❯
be able to talk through the key principles of managing a large infrastructure estate. Monitoring infrastructure and applications hosted using taking into consideration: Observability, Alerting, Uptime SLA's and SLO's Azure Devops pipeline management. Strong collaboration with both engineering teams and colleagues in customer-facing teams. Excellent communicator both in written and verbal forms. Comfortable breaking down big tasks More ❯
include: To form part of a critical operations function that is responsible for the monitoring, availability and performance of production services. Responding to stakeholder requests within agreed timescales or SLO Drive automation to reduce failures, manual tasks and therefore improving overall application performance and availability. Perform systems administration activities to ensure the smooth operation of applications across multiple platforms Coordinate More ❯
billion events per day. To ensure the reliability of this environment for our customers, SREs work closely with developers and product managers to understand servicelevelobjectives, think through failures scenarios, and design systems which balance cost with reliability objectives. Additionally, SREs collaborate with the Information Security team to ensure that More ❯
billion events per day. To ensure the reliability of this environment for our customers, SREs work closely with developers and product managers to understand servicelevelobjectives, think through failures scenarios, and design systems which balance cost with reliability objectives. Additionally, SREs collaborate with the Information Security team to ensure that More ❯
Delivery, and DevOps teams. The CoE Lead will manage the contract with third-party providers responsible for the execution layer, ensuring adherence to service-level agreements (SLAs) and key performance indicators (KPIs). The position involves a 75% focus on the design of frameworks and a 25% focus on implementation … observability, monitoring, and tooling. Service Management: Practical experience in building and maintaining a Service Catalogue, assigning servicelevelobjectives (SLOs), and measuring servicelevel indicators (SLIs). Experience in operating production services during peak trading More ❯
Cardiff, South Glamorgan, United Kingdom Hybrid / WFH Options
RVU Co UK
s perspective by sharing your experience, knowledge & expertise in a continuous learning environment. As a member of the platform engineering team you will be accountable for the following: Objective setting, feature ideation, development and measurement Architectural decisions and designs of the platform, domains and systems Defining, evolving, and applying team processes Building efficient CI/CD pipelines and … well architected principles Solid understanding of platform and reliability engineering approaches (SRE), including observability, performance optimisation, capturing analytics and security best practices Experience implementing ServiceLevelObjectives and using them to drive error budgets, risk management and alerting Knowledge and experience with operating containers at scale within the Kubernetes ecosystem Experience More ❯
RVUs perspective by sharing your experience, knowledge & expertise in a continuous learning environment. As a member of the platform engineering team you will be accountable for the following: Objective setting, feature ideation, development and measurement Architectural decisions and designs of the platform, domains and systems Defining, evolving, and applying team processes Building efficient CI/CD pipelines and … well architected principles Solid understanding of platform and reliability engineering approaches (SRE), including observability, performance optimisation, capturing analytics and security best practices Experience implementing ServiceLevelObjectives and using them to drive error budgets, risk management and alerting Knowledge and experience with operating containers at scale within the Kubernetes ecosystem Experience More ❯
RVUs perspective by sharing your experience, knowledge & expertise in a continuous learning environment. As a member of the platform engineering team you will be accountable for the following: Objective setting, feature ideation, development and measurement Architectural decisions and designs of the platform, domains and systems Defining, evolving, and applying team processes Building efficient CI/CD pipelines and … well architected principles Solid understanding of platform and reliability engineering approaches (SRE), including observability, performance optimisation, capturing analytics and security best practices Experience implementing ServiceLevelObjectives and using them to drive error budgets, risk management and alerting Knowledge and experience with operating containers at scale within the Kubernetes ecosystem Experience More ❯
Cardiff, South Glamorgan, Wales, United Kingdom Hybrid / WFH Options
Confused.com
RVUs perspective by sharing your experience, knowledge & expertise in a continuous learning environment. As a member of the platform engineering team you will be accountable for the following: Objective setting, feature ideation, development and measurement Architectural decisions and designs of the platform, domains and systems Defining, evolving, and applying team processes Building efficient CI/CD pipelines and … well architected principles Solid understanding of platform and reliability engineering approaches (SRE), including observability, performance optimisation, capturing analytics and security best practices Experience implementing ServiceLevelObjectives and using them to drive error budgets, risk management and alerting Knowledge and experience with operating containers at scale within the Kubernetes ecosystem Experience More ❯
culture of innovation, collaboration, and continuous improvement. Ensure network automation complies with relevant regulatory requirements, security requirements and industry standards. Establish Key Performance Indicators and Service-LevelObjectives to measure operational effectiveness. Build relationships with CTO, Application Production Support & Engineering, CIO organizations and other stakeholders. Communicate effectively with technical and non More ❯
a code concept is desirable. Experience with build automation, test driven development, continuous integration and delivery Experience with Relational and non Relational Databases Previous SRE experience including knowledge about SLO/SLA/SLI and error budgets, is advantageous Experience working or familiarity with one public cloud (AWS, Google or Azure) If this is of interest and you have the More ❯
initiatives from design through deployment and operations Write maintainable, well-tested, high-quality code and uphold engineering best practices Focus on observability and maintain ServiceLevelObjectives, take operational responsibility for the Identity Platform, including joining the on-call rota Foster a strong engineering culture through mentorship, code reviews, and collaboration More ❯
initiatives from design through deployment and operations Write maintainable, well-tested, high-quality code and uphold engineering best practices Focus on observability and maintain ServiceLevelObjectives, take operational responsibility for the Identity Platform, including joining the on-call rota Foster a strong engineering culture through mentorship, code reviews, and collaboration More ❯
Newcastle Upon Tyne, Tyne and Wear, North East, United Kingdom Hybrid / WFH Options
Develop
platform's core value streams. Key Responsibilities Technical Leadership & Strategy Champion engineering best practices, system reliability, and architectural integrity Define and track progress toward ServiceLevelObjectives (SLOs) Collaborate with product stakeholders to shape robust and scalable solutions Take responsibility for non-functional areas such as performance, maintainability, and security Provide More ❯
Sunderland, Tyne and Wear, UK Hybrid / WFH Options
Develop
platform's core value streams. Key Responsibilities Technical Leadership & Strategy Champion engineering best practices, system reliability, and architectural integrity Define and track progress toward ServiceLevelObjectives (SLOs) Collaborate with product stakeholders to shape robust and scalable solutions Take responsibility for non-functional areas such as performance, maintainability, and security Provide More ❯
Newcastle Upon Tyne, Tyne And Wear, United Kingdom
Bede Gaming
you'll be doing Technical Leadership & Strategy Champion technical quality, system health, and architectural integrity across your value stream Define and drive progress towards ServiceLevelObjectives (SLOs) in collaboration with Principal Engineers Work closely with Product Owners and Product Managers to design scalable, high-performing technical solutions that align with More ❯
maintain steady state global delivery operations (Green) Manage Operational/delivery issues & escalations Ensure continuous communication & coordination with client in the event of issue/escalation Ensure SLA/SLO attainment & process compliance along with high customer satisfaction Act as first point of escalation for the day-to-day functioning of delivery operations team Handling escalations- identifying the gap, preparing More ❯