travel to Scotland Employment Type: 6 month Contract Rate: £550 per day, Outside of IR35 Role Overview Morgan Hunt are seeking an experienced SiteReliabilityEngineer (SRE)/Unix Infrastructure Engineer to support the deployment, migration, and optimisation of critical infrastructure services. The role involves ensuring high availability, disaster recovery readiness, and automation-driven improvements across More ❯
our customer's systems are built and maintained. This role blends operational product support with software engineering to create applications to understand the overall health of our systems. The SRE team sits within a wider programme at the core of the customer mission. The role holder: As an SRE, fundamentally you will be doing work that has historically been done … engineering expertise to substitute automation for human labour, with the objective of limiting traditional manual operations work (incident tickets, on-call etc.) to no more than half of the SRE team's time (and aiming for considerably less). You will have an enthusiasm to learn and experiment, to develop tools to understand application health and improve their reliability … enable them to be scalable and resilient to failure, and how to get the best out of the infrastructure they are deployed to. Participating in the wider DevOps/SRE community within the organisation. Competancies It is desirable for you to have experience in the areas below. However more valued for this role is that you have excitement and enthusiasm More ❯
Hybrid position with on-calls We are seeking a highly motivated and skilled SiteReliabilityEngineer (SRE) to ensure the reliability, performance, and scalability of the client's critical Data Platform solutions. In this role, you will provide dedicated support and maintain the health of the data infrastructure. This position involves on-call responsibilities to address More ❯
# SiteReliability EngineerRemote - APAC/EngineeringThe Tyk API Management platform is helping to drive the connected world and power new products and services. We're changing the way that organisations connect any number of their systems and services.Whether internal, external, public or highly encrypted systems, Tyk helps businesses drive value across the retail, finance, telecoms, healthcare, or … radical responsibility If this sounds like an environment that you believe could work for you then read on to find out more. The role: We're looking for a SiteReliabilityEngineer to manage, maintain, improve and provide support on our platform. You will be curious by nature, always looking for ways to improve, as we will … we expect this role to be advocate of continuous improvement Reliability of our new global Tyk Cloud platform Automation of operations and support Writing and maintaining documentation on SRE processes and policies Recommending and implementing ways of driving operational efficiency and driving down our cost to run, without impacting service Assisting in penetration testing for Cloud through liaising with More ❯
ReliabilityEngineer - Public Sector - Outside IR35 - Edinburgh (Hybrid) Day Rate - up to £560 (outside IR35) Duration - 6 months Harvey Nash's Client are hiring an experienced SRE, to support and enhance an existing digital platform. Responsibilities Support deployment and migration for services to RHEL8/9 Develop and strengthen automation to support disaster recovery activities Support for More ❯
Founded in 2001, Resident Advisor (RA) is one of the world's longest-running music media brands and a cornerstone of the dance, electronic and DJ ecosystem. The site's audience of over 6 million monthly users is drawn in by a combination of news, editorial, club listings and ticketing, RA-branded events at venues and festivals worldwide, original … films and a weekly mix series that has run for 18 years. We're looking for a Senior SiteReliabilityEngineer passionate about electronic music to join our Core Platform team. This role is office based (minimum 3 days/week in-office), and offers flexibility to work hybridly. You'll help scale our high-traffic infrastructure … MSSQL databases, ElasticSearch, Redis, and Kafka running on AWS EKS (Kubernetes), managed via Terraform with CI/CD pipelines and DataDog monitoring. Your responsibilities include improving infrastructure performance and reliability, driving modernization and cost optimization, developing shared components (i.e. auth systems, GraphQL gateways), enhancing developer experience, maintaining E2E testing systems, and creating internal tooling. This is an opportunity to More ❯
In Order to Join the ELEVI Team you will need Position: SiteReliabilityEngineer (SRE) - System Administrator, Mid Clearance: Clearable You Have: 4+ years of experience working with AWS infrastructure and platforms including Infrastructure as Code 4+ years of experience working with and/or administering Linux environments Experience delivering software to clients using Agile methodologies, including More ❯
Job Description Would you like to be an Engineer that builds the Cloud, rather than just uses it? At AWS, our Engineers manage the behind-the-scenes software and tools that support the world's largest cloud computing infrastructure. We … offer an exciting opportunity to join a world-class network team in a dynamic environment that feels like a start-up. As a SiteReliabilityEngineer (SRE) , you will deploy, manage, troubleshoot, and innovate the tools, services, and components that enable our network engineers to automate and maintain network operations. Your internal customers are your network engineering More ❯
Columbia, Missouri, United States Hybrid / WFH Options
Centene
organization, Centene's technology professionals have access to competitive benefits including a fresh perspective on workplace flexibility. Position Purpose: We are seeking a highly skilled and experienced M365 Lead SiteReliabilityEngineer to join our team. The ideal candidate will be responsible for developing and creating monitoring and observability dashboards within Splunk, Dynatrace, and other monitoring and … alerting platforms. This role requires advanced proficiency in PowerShell scripting and Graph APIs, as well as intermediate proficiency in Power Apps/Automate. This role will ensure the reliability, performance, and scalability of our Microsoft 365 environment. Leads team to identify problems with systems and services and drives regular deployment of new versions of the systems and their subcomponents … visibility. Drives decisions around periodic system validation and testing, service monitoring, and standing up new services/tools Uses advanced knowledge and experience to identify strategies that increase system reliability and performance through on-call rotation and process optimization Leads post incident reviews and documents findings for future informed decision making Drives implementation of approved proposals to optimize Software More ❯
Jefferson City, Missouri, United States Hybrid / WFH Options
Centene
organization, Centene's technology professionals have access to competitive benefits including a fresh perspective on workplace flexibility. Position Purpose: We are seeking a highly skilled and experienced M365 Lead SiteReliabilityEngineer to join our team. The ideal candidate will be responsible for developing and creating monitoring and observability dashboards within Splunk, Dynatrace, and other monitoring and … alerting platforms. This role requires advanced proficiency in PowerShell scripting and Graph APIs, as well as intermediate proficiency in Power Apps/Automate. This role will ensure the reliability, performance, and scalability of our Microsoft 365 environment. Leads team to identify problems with systems and services and drives regular deployment of new versions of the systems and their subcomponents … visibility. Drives decisions around periodic system validation and testing, service monitoring, and standing up new services/tools Uses advanced knowledge and experience to identify strategies that increase system reliability and performance through on-call rotation and process optimization Leads post incident reviews and documents findings for future informed decision making Drives implementation of approved proposals to optimize Software More ❯
team of passionate thinkers, innovators, and dreamers - and help us connect people and build communities to create economic opportunity for all. About the team and the role: As a SiteReliabilityEngineer at eBay, you'll play a key role in managing major incidents and the overall health of our services, making sure they are both resilient … and high-performing. You'll create strategies for availability and reliability, enhance domain ecosystem observability, and support a shift toward a more engineering-focused culture. Your contributions will ensure that eBay's technology remains cutting-edge and reliable for our global community. What you will accomplish: Proactive Monitoring : Continuously monitor the health of eBay's critical services to identify … and address potential issues before they escalate. Solution Development : Collaborate with Architecture, Engineering, and Operations teams to develop solutions that ensure high site availability, reliability and performance. Collaborative Problem Solving : Work closely with partner teams to resolve recurring technical issues, onboard new alerts, and develop high-quality Standard Operating Procedures (SOPs). Automation and Process Enhancement : Identify and More ❯
distributed systems. Ability to debug, optimize code, and to automate routine tasks. Systematic problem-solving approach, coupled with effective communication skills. About the job SiteReliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that Google's services-both our internally critical and our externally-visible … systems-have reliability, uptime appropriate to users' needs and a fast rate of improvement. Additionally SRE's will keep an ever-watchful eye on our systems capacity and performance. Much of our software development focuses on optimizing existing systems, building infrastructure and eliminating work through automation. On the SRE team, you'll have the opportunity to manage the complex … challenges of scale which are unique to Google, while using your expertise in coding, algorithms, complexity analysis and large-scale system design. SRE's culture of intellectual curiosity, problem solving and openness is key to its success. Our organization brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think big and take More ❯
Are you an experienced Senior DevOps/SiteReliabilityEngineer looking for your next contract role? Join one of the world's leading IT services, consulting, and business solutions organization. Founded in 1968, the company consistently ranks among the top global IT service providers. With a presence in over 50 countries, the company has built a reputation … across industries including banking, healthcare, telecommunications, and retail. The leading consultancy firm has partnered with a global technology leader and they are currently seeking an experienced Senior DevOps/SiteReliabilityEngineer to join the team. Additionally, this role provides a hybrid working arrangement based in London. Ready to make a move? Get in touch and apply More ❯
looking for experienced SREs help grow our small team into a global footprint that can provide expert engagement across our core serving systems. As an early member of the SRE team you will report directly to the Director of Managed Infrastructure and play a foundational role in expanding our SRE practice, integrating reliability principles more deeply into Vercel's … Devise repeatable, low-toil operational practices through the development of automated systems for software delivery, system failover, and capacity management. About You: At least 3 years experience in an SRE role, or at least 5 years experience in an adjacent role (e.g. platform engineering), operating in a scaled environment. Firm grasp of the SRE philosophy and mindset, with practical experience … working on or directly with SRE teams that have proactively engaged in system design and improvement. Strong sense of accountability and commitment to problem solving, backed by a curiosity to dig deep and identify root causes. Willingness to proactively engage with development teams to influence the course of software design and operational practices. Capability to manage risk, make decisions, and More ❯
SiteReliabilityEngineer £70,000 pa Hertfordshire My client, a leading entertainment group, is looking for a mid-level SRE to join their platform team in their Hertfordshire office. In this role, you'll take ownership of the end-to-end monitoring and alerting stack, designing and maintaining infrastructure and alert configurations (e.g., with Prometheus/Grafana More ❯
Watford, Hertfordshire, South East, United Kingdom
La Fosse
SiteReliabilityEngineer (Python) £70,000 pa Hertfordshire My client, a leading entertainment group, are looking for a mid level SRE to join their platform team in their Hertfordshire office. In the role you'll take ownership of the end-to-end monitoring and alerting stack, designing and maintaining infrastructure and alert configurations (e.g., with Prometheus/ More ❯
As a Senior SiteReliabilityEngineer (SRE), you will play a critical role in ensuring the reliability, performance, and scalability of our software systems and infrastructure. You'll leverage your engineering expertise to design and deliver resilient platforms, improve observability, automate operations, and guide squads in applying SRE principles effectively. Working closely with Principal Engineers, Squad … automation and tooling to reduce toil and improve delivery consistency. Document technical designs, solutions, and operational procedures to support collaboration and sustainability. Technical Excellence & Best Practices Champion and embed SRE principles, including SLOs, SLIs, and error budgets. Actively contribute to peer reviews and encourage a culture of continuous feedback. Drive engineering maturity through shared standards, tooling, and modernisation efforts. Raise … CD pipelines, particularly for frontend codebases (e.g. Azure DevOps, GitHub Actions). Good working knowledge of SQL and interacting with data layers in support of web applications. Experience applying SRE principles to frontend and web-based systems: SLIs/SLOs, performance budgets, error tracking, and synthetic testing. Experience with testing frameworks such as Jest, Vitest, Playwright, and supporting reliable deployments. More ❯
Honolulu, Hawaii, United States Hybrid / WFH Options
OMW Consulting
Role - SiteReliabilityEngineer Location - Honolulu - Hybrid - 1-2 days a week on site Security … clearance - Minimum Secret - need this ahead of applying Salary - $150k-$200k + Equity I am partnered with a leading defense tech scale up who are looking to add an SRE to their team based in Hawaii. This role is hybrid with an expectation of 1-2 days on site in Honolulu, however there is some weeks where you will … not need to go on site at all. Due to the nature of the client you must hold an active secret clearance as a minimum ahead of applying for this position. To be considered for this position you must have experience with the following: Experience with Security Clearance and DoD IT Environment: You hold an active security clearance, are More ❯
level. Being a part of this team will accelerate your career. Take a closer look at the role: Job Description: We have an opportunity for a talented DevOps/SREEngineer to join the TWG Cadillac Formula 1 Team as part of the Event IT Team. In your role as a DevOps/SREEngineer, you will be … at the forefront of developing our technological advantage by maintaining the reliability, scalability, and performance of our cloud and on-premises infrastructure. You will collaborate with software engineers, data scientists, and race strategists to streamline application deployments, monitor system performance, and troubleshoot advanced operational issues. Your work will directly impact the team's race performance by ensuring smooth data … Pipelines: Build and maintain CI/CD pipelines for rapid deployment and software updates. Monitoring & Alerting: Utilize advanced monitoring tools for proactive system health checks and automated incident alerts. SiteReliability: Improve system reliability through incident management, root cause analysis, and capacity planning. Security & Compliance: Follow security best practices, including access control, vulnerability management, and adherence to More ❯
sector, our technology is truly flexible and designed to transform any business at scale. We've created a unified platform that adapts to diverse needs, offering the scalability and reliability legacy systems simply can't match. At ZILO, our DNA is built on Character, Creativity, and Craftsmanship. We face every challenge with integrity, explore new ideas with a curious … If you're ready to shape the future, let's talk. About the Role We're looking for a Senior SiteReliabilityEngineer to join our SRE team. This is a hybrid role that blends deep platform engineering with application-level troubleshooting . You'll be responsible for the stability, performance, and resilience of our cloud-native … service code Resolve incidents and support root causes (Java and GoLang services) Contribute to postmortems and reliability engineering initiatives Who You Are Essential Experience 5+ years in an SRE, DevOps, or infrastructure role Deep hands-on experience with AWS , EKS/Kubernetes , and Terraform Working knowledge of Kafka tuning, monitoring, and operational troubleshooting Strong familiarity to be able to More ❯
Engineer to act as a North Star for this evolving discipline. As our first engineer in this role, you'll have the unique opportunity to shape our SRE strategy, establish best practices, and set the standard for service reliability and performance. What You'll Do Define strategies for Application Performance Monitoring, Unit Cost, and Chaos Engineering. Continuously … so product teams can innovate effectively. Playing a key role in shaping the core technology layers that drive our platform's success. What You Need Proven experience implementing SRE principles at scale, including deep knowledge of SLI/SLO/SLA differences. A product engineering background with strong coding skills in Python, C#, or similar. Experience with incident management … PCI compliance). Background in capacity planning, performance, and load testing. Sysadmin skills for troubleshooting disk, network, and infrastructure issues. Why Join Thredd? The chance to define and lead SRE best practices from the ground up. A high-impact role in a rapidly growing company. A collaborative, innovation-driven culture where your expertise will shape our platform's future. If More ❯
we share the passion to code, create, and ultimately build an open, accessible and fair financial future, one piece of software at a time. We are looking for a SiteReliabilityEngineer to join our Core team to encourage infrastructure best practices across our organization that would allow to securely scale a distributed financial platform that touches … of people a day. Our distributed financial platform tackles some of the most interesting problems in the crypto for millions of our customers and continues to grow rapidly. The SRE team at blockchain combines software and systems engineering to provide a platform that abstracts complexity for increased security, reliability and rapid product delivery. The SRE organization at Blockchain is … and scalable manner. WHAT YOU WILL DO You will be able to play a critical role in evolving our infrastructure as we develop solutions to complex technical problems involving reliability, latency, bandwidth and most importantly security. You will be an integral part of improving observability, monitoring and alerting throughout the platform. You will help co-ordinate work across different More ❯
and future states of the organisation and make faster, more informed decisions. The company is headquartered in London, with offices in Philadelphia, The Hague, Toronto, and Sydney. Role: Principal SiteReliabilityEngineer You will be a senior technical leader focused on scaling and hardening our AWS- and Kubernetes-based infrastructure. You will collaborate across product, platform, and … expertise, excellent communication skills, and a collaborative spirit. Responsibilities: Define and enforce SLOs, SLIs, and error budgets across critical services Develop and implement cloud infrastructure and tooling strategies Enhance SRE practices across the organization Implement robust observability metrics, logs, and traces using our observability tools Guide the team in building automated, self-healing systems Own and evolve incident response processes … security, DevOps, and software teams to ensure compliance and operational excellence Evaluate and adopt tools and practices to improve platform performance and reliability Desired Skills & Experience: Experience leading SRE transformations Hands-on expertise with Kubernetes (EKS preferred) in production Strong experience with AWS core services (EC2, EKS, RDS, S3, ALB/NLB, IAM, CloudWatch, etc.) Proficiency in Infrastructure as More ❯
SiteReliabilityEngineer Key Responsibilities: This position will primarily focus on providing design and implementation expertise on infrastructure provisioning, management and lifecycle implementation of cloud components and services, containers and other critical concepts of DevSecOps principles. Increase platform reliability through automation, health checks, and resilient rollout patterns. Build and deploy health checks, auto-scaling, and self … healing components. Implement advanced deployment strategies (blue/green, canary). Automate rollbacks and recovery paths in CI/CD pipelines. Integrate reliability testing into dev workflows. Required Skills: Kubernetes (probes, readiness strategies). CI/CD pipelines (GitHub Actions, ArgoCD). Automation via Helm, Terraform. Experience with rollout strategies and traffic shifting. Clearance: Secret More ❯
SiteReliabilityEngineer Salary $140k-$200k + Equity Secret Clearance or higher is required My client, a VC-backed organization in the defense tech space, is looking to hire multiple SREs as they build out their DevOps team across the USA. My client has created a modern product which is streamlining processes and saving time in critical … rest of the skills and experience needed for this position are listed below: Secret Clearance or higher Experience working within the DOD cloud environment 4 Years+ Experience as a SRE Experience in creating CI/CD Pipelines Strong knowledge of Kubernetes Experience with either Ironbank, Cloud One, Platform one Risk management Framework security experience Experience working with AWS If you More ❯