the reliability of all cloud systems while keeping levels of manual work low. SREs are expected to be experienced in software engineering principals, operational discipline, and automation. The SRE team work on a fully remote basis and work in conjunction with their US and Australian teams as well. This company are a market leader in Student community management software … to ensure high availability and performance Collaborate with product engineering teams to design/build fit-for-purpose and observable software Required Skills and Experience: Proven experience in a SRE/DevOps/Platform Engineering role and having previously worked in a Software Engineering role in .Net and C# or Java or similar OO development language. Proficiency in C# or … and this job is part of a large program of change and improvement in their Cloud SaaS products over the coming years. If you are looking for an interesting SRE role with a forward-thinking global organisation, then this would be a tremendous career opportunity to consider. Please apply with your CV to find out more. More ❯
European cloud revolution. We supercharge our customers to innovate in hyperscaler cloud, enabling seamless migration, advanced security, and data-driven success. Currently, we are looking for a Senior Azure SiteReliabilityEngineer to join our team in the UK. Your daily responsibilities: Architect, implement, and improve existing monitoring and alerting systems Proactively investigate and identify performance anomalies More ❯
Senior SiteReliabilityEngineer Start: ASAP Duration: 6-12 months Location: hybrid, London (Tuesdays, Thursdays WFH) Pay: negotiable, inside IR35 We're looking for an experienced DevOps Engineer to join our team on a contract basis, with a focus on AWS infrastructure, observability tooling, and CI/CD automation. This is a hands-on role supporting … Python, Bash, Go or SQL - Work with Git-based workflows for infrastructure as code - Troubleshoot Kubernetes workloads and containerised services - Participate in an on-call rotation to ensure system reliability Your Profile Essential: - Solid hands-on AWS experience in a DevOps setting - Background in incident, change, and problem management - Strong with Prometheus, Grafana, Splunk, and PromQL - Proficient in scripting More ❯
are passionate about building unified IT solutions that simplify the way IT organizations work. We are currently looking for a SiteReliabilityEngineer to join our SRE team in the Platform Engineering organization and help us scale our products to millions of end-users. We are looking for individuals with a passion for automation and observability, ensuring … and SOP's Develop software, scripts, or tooling to improve efficiency and reduce delivery time of applications and infrastructure Other duties as needed About You 5+ years' experience in SiteReliabilityEngineer roles Expert+ level Linux administration, scripting, and troubleshooting Demonstrable knowledge of Observability tools (Prometheus/Grafana, New Relic, Splunk, DataDog) Comprehensive experience with AWS (Amazon More ❯
are passionate about building unified IT solutions that simplify the way IT organizations work. We are currently looking for a SiteReliabilityEngineer to join our SRE team in the Platform Engineering organization and help us scale our products to millions of end-users. We are looking for individuals with a passion for automation and observability, ensuring … and SOP's Develop software, scripts, or tooling to improve efficiency and reduce delivery time of applications and infrastructure Other duties as needed About You 5+ years' experience in SiteReliabilityEngineer roles Expert+ level Linux administration, scripting, and troubleshooting Demonstrable knowledge of Observability tools (Prometheus/Grafana, New Relic, Splunk, DataDog) Comprehensive experience with AWS (Amazon More ❯
passionate about building unified IT solutions that simplify the way IT organizations work. We are currently looking for a Senior SiteReliabilityEngineer to join our SRE team in the Platform Engineering organization and help us scale our products to millions of end-users. We are looking for individuals with a passion for automation and observability, ensuring … and SOP's Develop software, scripts, or tooling to improve efficiency and reduce delivery time of applications and infrastructure Other duties as needed About You 7+ years' experience in SiteReliabilityEngineer roles 3+ years' experience with an object-oriented language (preferably Java, .NET or C++) Expert+ level Linux administration, scripting, and troubleshooting Demonstrable knowledge of Observability More ❯
Job Title: Senior SRE - SiteReliability Engineering for Observability Location: London (Mostly Remote | 1 Day/Week in Office) Pay Rate: £50 - £62 per hour (Inside IR35) Contract Duration: Initial 12 Months Working Hours: 11:00 AM - 7:00 PM About the Role We're looking for a Senior SiteReliabilityEngineer (SRE) to join … team within a leading global tech environment. This is a hands-on, senior-level role focused on building and scaling large-scale monitoring and logging platforms that ensure service reliability, performance, and visibility. If you're passionate about distributed systems, high-throughput data pipelines, and enabling engineering teams with top-tier observability tooling-this is the role for you. … infrastructure using Terraform and configuration with Ansible . Participating in on-call rotations to ensure platform uptime and responsiveness. What We're Looking For 5+ years of experience in SRE/DevOps roles , managing large-scale systems Strong technical knowledge of Linux (Ubuntu/Debian) environments Proven experience with observability tools such as: ELK Stack (Elasticsearch, Logstash, Kibana) Prometheus, Grafana More ❯
SiteReliabilityEngineer Remote - Canada, Americas/Engineering We offer The Tyk API Management platform is helping to drive the connected world and power new products and services. We're changing the way that organisations connect any number of their systems and services.Whether internal, external, public or highly encrypted systems, Tyk helps businesses drive value across the … radical responsibility If this sounds like an environment that you believe could work for you then read on to find out more. The role: We're looking for a SiteReliabilityEngineer to manage, maintain, improve and provide support on our platform. You will be curious by nature, always looking for ways to improve, as we will … we expect this role to be advocate of continuous improvement Reliability of our new global Tyk Cloud platform Automation of operations and support Writing and maintaining documentation on SRE processes and policies Recommending and implementing ways of driving operational efficiency and driving down our cost to run, without impacting service Assisting in penetration testing for Cloud through liaising with More ❯
# SiteReliability EngineerRemote - APAC/EngineeringThe Tyk API Management platform is helping to drive the connected world and power new products and services. We're changing the way that organisations connect any number of their systems and services.Whether internal, external, public or highly encrypted systems, Tyk helps businesses drive value across the retail, finance, telecoms, healthcare, or … radical responsibility If this sounds like an environment that you believe could work for you then read on to find out more. The role: We're looking for a SiteReliabilityEngineer to manage, maintain, improve and provide support on our platform. You will be curious by nature, always looking for ways to improve, as we will … we expect this role to be advocate of continuous improvement Reliability of our new global Tyk Cloud platform Automation of operations and support Writing and maintaining documentation on SRE processes and policies Recommending and implementing ways of driving operational efficiency and driving down our cost to run, without impacting service Assisting in penetration testing for Cloud through liaising with More ❯
world in a wide variety of disciplines. We're always on the lookout for energetic, creative people to join our team. Your New Role SiteReliability Engineering (SRE) team members work with our Global Content Delivery teams to deliver exabytes of content for our brands globally. The SRE has a highly skilled combination of engineering and operations skills … and is focused on automating and improving operations. Their job is to guarantee system reliability, performance, and supportability with a strong engineering emphasis on building autonomous solutions that deliver value to end-users early, often, & fast. They are central to the reputation and trustworthiness of our services and act as an advocate for engineering best practices. WBD's Global … build tools to help automate deployments. Coordinate with relevant teams to build useful tools to support network operations (internal and external). Qualifications and Experience The Essentials: Passionate about SRE, DevOps, Automation, and infrastructure platforms. Understand the mechanical sympathy between software workloads and the demand it places on the underlying hardware. Working knowledge of non-virtualized server hardware, datacenter operations More ❯
our customer's systems are built and maintained. This role blends operational product support with software engineering to create applications to understand the overall health of our systems. The SRE team sits within a wider programme at the core of the customer mission. The role holder: As an SRE, fundamentally you will be doing work that has historically been done … engineering expertise to substitute automation for human labour, with the objective of limiting traditional manual operations work (incident tickets, on-call etc.) to no more than half of the SRE team's time (and aiming for considerably less). You will have an enthusiasm to learn and experiment, to develop tools to understand application health and improve their reliability … enable them to be scalable and resilient to failure, and how to get the best out of the infrastructure they are deployed to. Participating in the wider DevOps/SRE community within the organisation. Competancies It is desirable for you to have experience in the areas below. However more valued for this role is that you have excitement and enthusiasm More ❯
sector, our technology is truly flexible and designed to transform any business at scale. We've created a unified platform that adapts to diverse needs, offering the scalability and reliability legacy systems simply can't match. At ZILO, our DNA is built on Character, Creativity, and Craftsmanship. We face every challenge with integrity, explore new ideas with a curious … re ready to shape the future, let's talk. Job Description: As a Go Developer at ZILO Technologies, you will play a crucial role in maintaining and enhancing the reliability, performance, and scalability of our platform. You will be responsible for addressing defect fixes, implementing small changes, and contributing to ongoing enhancements of our Go-based microservices stack. Key … platform. Implement small changes and enhancements to improve system functionality and performance. Contribute to the design, development, and deployment of microservices in a Go environment. Monitor system performance and reliability, proactively addressing potential issues. Develop and maintain automation tools to streamline operational processes. Participate in on-call rotations to ensure 24/7 system availability and rapid incident response. More ❯
Are you an experienced Senior DevOps/SiteReliabilityEngineer looking for your next contract role? Join one of the world's leading IT services, consulting, and business solutions organization. Founded in 1968, the company consistently ranks among the top global IT service providers. With a presence in over 50 countries, the company has built a reputation … across industries including banking, healthcare, telecommunications, and retail. The leading consultancy firm has partnered with a global technology leader and they are currently seeking an experienced Senior DevOps/SiteReliabilityEngineer to join the team. Additionally, this role provides a hybrid working arrangement based in London. Ready to make a move? Get in touch and apply More ❯
and future states of the organisation and make faster, more informed decisions. The company is headquartered in London, with offices in Philadelphia, The Hague, Toronto, and Sydney. Role: Principal SiteReliabilityEngineer You will be a senior technical leader focused on scaling and hardening our AWS- and Kubernetes-based infrastructure. You will collaborate across product, platform, and … expertise, excellent communication skills, and a collaborative spirit. Responsibilities: Define and enforce SLOs, SLIs, and error budgets across critical services Develop and implement cloud infrastructure and tooling strategies Enhance SRE practices across the organization Implement robust observability metrics, logs, and traces using our observability tools Guide the team in building automated, self-healing systems Own and evolve incident response processes … security, DevOps, and software teams to ensure compliance and operational excellence Evaluate and adopt tools and practices to improve platform performance and reliability Desired Skills & Experience: Experience leading SRE transformations Hands-on expertise with Kubernetes (EKS preferred) in production Strong experience with AWS core services (EC2, EKS, RDS, S3, ALB/NLB, IAM, CloudWatch, etc.) Proficiency in Infrastructure as More ❯
sector, our technology is truly flexible and designed to transform any business at scale. We've created a unified platform that adapts to diverse needs, offering the scalability and reliability legacy systems simply can't match. At ZILO, our DNA is built on Character, Creativity, and Craftsmanship. We face every challenge with integrity, explore new ideas with a curious … If you're ready to shape the future, let's talk. About the Role We're looking for a Senior SiteReliabilityEngineer to join our SRE team. This is a hybrid role that blends deep platform engineering with application-level troubleshooting . You'll be responsible for the stability, performance, and resilience of our cloud-native … service code Resolve incidents and support root causes (Java and GoLang services) Contribute to postmortems and reliability engineering initiatives Who You Are Essential Experience 5+ years in an SRE, DevOps, or infrastructure role Deep hands-on experience with AWS , EKS/Kubernetes , and Terraform Working knowledge of Kafka tuning, monitoring, and operational troubleshooting Strong familiarity to be able to More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
BOSS Professional Services LTD
SREEngineer Full-time UK - Remote/Hybrid My client is a high growth ecommerce business which runs it technology stack on AWS. Due to the nature of the business the SREEngineer will need to support sudden peaks in traffic smoothly scaling. They also host other ecommerce platform for other brands which also need supporting. As an … SREEngineer you will maintain a scalable and reliable production environment for running software services while helping grow the customer base and product offering. For the SREEngineer role we are seeking: Technology stack: Kubernetes, MySQL, PostgreSQL, PHP, Python, Docker, AWS Lambda, AWS, Redis, ELK, monitoring: Prometheus, Grafana or Loki You have previous experience of working within SRE … Assist and support the DevOps engineers: setting up the infrastructure for microservices Work closely with rest of the DevOps and QA team to load test applications Responsibilities for the SREEngineer include: Create sustainable systems and services through automation and uplifts Partner with development teams to improve services Gather and analyse metrics from both operating systems and applications Participate More ❯
Global SiteReliabilityEngineer Location: London About Us Founded in 2013, GSR is a leading market maker and programmatic trading firm in the fast-evolving world of cryptocurrency trading. With over 200 employees across seven countries, we provide billions of dollars in liquidity daily to cryptocurrency protocols and exchanges. We build long-term relationships with crypto communities … GSR is an opportunity to be deeply embedded in every major sector of the cryptocurrency ecosystem. About the Role We are seeking a SiteReliabilityEngineer (SRE) to design, optimize, and support highly available systems across our global trading infrastructure. As part of GSR's SRE team, you will manage a multi-regional cloud environment while integrating … work across all layers of infrastructure, including: Networking & Exchange Connectivity Linux Systems & Kubernetes Administration Microservice Orchestration & Observability Disaster Recovery & Security Optimization Your mission is to improve latency, scalability, and reliability, ensuring GSR remains a best-in-class market maker. We value engineers who drive automation, reduce friction, and enhance developer velocity through better tooling, CI/CD, and infrastructure More ❯
impact. We value continuous learning, personal growth, and providing our team with resources to succeed. Ready to shape the future? Let's talk. We're looking for a seasoned SRE with a front-end focus, expert in React applications, to join our SRE team. In this role, you'll ensure the reliability, performance, and operability of our React-based … invalidation, HTTP caching headers) to reduce latency and origin load. Collaborate with UX teams to balance feature richness with performance targets. Collaboration & Knowledge Sharing Serve as the React/SRE subject-matter expert: mentor engineers on best practices for building resilient front-ends. Produce and maintain runbooks, debugging guides, and incident-playbooks specific to client-side failures. Partner closely with … wider backend SRE, DevOps, and product teams to ensure end-to-end reliability. Enhanced leave - 38 days inclusive of 8 UK Public Holidays. Private Health Care including family cover. Life Assurance - 5x salary. Flexible working - work from home and/or in our London Office. Employee Assistance Program. Company Pension (Salary Sacrifice options available). Access to training and development. More ❯
flexible remoteworking locations within UK/Europe) Employment type: Permanent Working Hours: Full time (9-6 UK) Salary: Up to £110K + Shares + Benefits TransFICC is hiring a SiteReliabilityEngineer to provide high-performance services to our customers. We develop an integration service … product that enables our clients to have a flexible, hosted service without requiring their internal resources to respond to connectivity challenges across trading venues. You will be joining our SRE team and contributing to TransFICC's automation culture. We are a multi-disciplinary team covering everything from desktop and laptop support to data centre provisioning of servers and vendor network … automated, so having experience with a software automation tool like Ansible and coding ability is a must. We are looking for someone experienced as a sys admin or network engineer; however, you must have a reasonable understanding of both. Constructive, open-minded and self-motivated. A belief in life learning, and an awareness of how much there still is More ❯