passionate about building unified IT solutions that simplify the way IT organizations work. We are currently looking for a Senior SiteReliabilityEngineer to join our SRE team in the Platform Engineering organization and help us scale our products to millions of end-users. We are looking for individuals with a passion for automation and observability, ensuring … and SOP's Develop software, scripts, or tooling to improve efficiency and reduce delivery time of applications and infrastructure Other duties as needed About You 7+ years' experience in SiteReliabilityEngineer roles 3+ years' experience with an object-oriented language (preferably Java, .NET or C++) Expert+ level Linux administration, scripting, and troubleshooting Demonstrable knowledge of Observability More ❯
passionate about building unified IT solutions that simplify the way IT organizations work. We are currently looking for a Senior SiteReliabilityEngineer to join our SRE team in the Platform Engineering organization and help us scale our products to millions of end-users. We are looking for individuals with a passion for automation and observability, ensuring … and SOP's Develop software, scripts, or tooling to improve efficiency and reduce delivery time of applications and infrastructure Other duties as needed About You 7+ years' experience in SiteReliabilityEngineer roles 3+ years' experience with an object-oriented language (preferably Java, .NET or C++) Expert+ level Linux administration, scripting, and troubleshooting Demonstratable knowledge of Observability More ❯
Washington, Washington DC, United States Hybrid / WFH Options
OMW Consulting
Job Title: SiteReliabilityEngineer (SRE) Location: Washington, DC - Hybrid Clearance: TS/SCI Salary: $160k-$200k Join a dynamic team dedicated to delivering best-in-class service quality and issue resolution for mission-critical deployments. In this role, you will be instrumental in shaping operational policies and implementations while working in both on-premise DoD environments … various OSI model layers to meet SLAs. Collaborate with developers to maintain secure and efficient workflows. What We're Looking For: Minimum of 4 years of experience as an SREengineer, with a strong focus on automation and deployment. Active security clearance with experience in DoD IT environments. Proficiency in VMware, Kubernetes, Docker, Helm, Ansible, and Terraform. Strong understanding More ❯
industry-recognised certifications, strong mentorship, and technical development programmes, you will have every chance to advance your career while working on cutting-edge AWS native databases and automation projects. SITERELIABILITYENGINEER Salary: £400 - £500/PD Inside IR 35Location: London You will be part of a close-knit team that values knowledge sharing, continuous learning, and … technical development programmes, you will have every chance to advance your career while working on cutting-edge AWS native databases and automation projects. What you'll do: As a SiteReliabilityEngineer based in London, you will play an integral role in supporting a wide range of AWS native databases including RDS, Aurora, Neptune, as well as … Contribute to enhancing product observability and telemetry by supporting ongoing modernisation efforts within the infrastructure.* Collaborate closely with engineering teams to brainstorm ideas that simplify infrastructure management and streamline SRE practices. What you bring: * Proficiency in Python or Unix Shell scripting combined with solid SQL skills enables you to automate tasks efficiently across complex environments.* A good understanding of development More ❯
We are seeking a foundational member for the Cloud Infrastructure team at Writer. This role involves contributing to the development and implementation of our SiteReliability Engineering (SRE) program. The ideal candidate will ensure the reliability, scalability, performance, and security of Writer's critical systems, proactively guaranteeing that our high-ROI products reach customers seamlessly. Your responsibilities … ensure cost efficiency. Ensure the security and compliance of our systems, adhering to industry standards and regulations. Provide mentorship and technical guidance to junior engineers, fostering a culture of reliability and continuous improvement. Stay current with emerging technologies and industry trends to improve our sitereliability practices. Is this you? Proven expertise in SiteReliability … Kubernetes) and orchestration tools. Knowledge of monitoring and logging tools (e.g., Prometheus, Grafana, ELK Stack) for maintaining system health and performance. Ability to lead and mentor junior engineers in reliability and system optimization best practices. Excellent communication skills for effective collaboration with cross-functional teams and stakeholders. Proactive in identifying and mitigating potential system failures and performance issues. Preferred More ❯
Senior SiteReliabilityEngineer Central London (Hybrid) Up to £100k + Car Allowance & Bonus TRIA are working with a leading hospitality client to hire a Senior SRE, where they are investing heavily in the performance, stability, and reliability of its digital platforms. This is a hands-on leadership role - you won't just guide others, you … Improving alerting, monitoring, and system-level metrics Driving better SLOs, SLIs, and overall uptime What you'll bring: Experience in high-traffic digital or eCommerce platforms 5+ years in SRE/DevOps roles; strong background in incident response Observability, automation, and infrastructure as code expertise Leadership skills - mentoring others or leading from the front The stack includes Kubernetes, Terraform, AWS … Python, and modern CI/CD tools, and it's evolving. If you understand what a good SRE practice looks like, and want to leave systems in a better place than you found them, please apply to be considered and learn more More ❯
architectures, standards, and methods for large-scale distributed systems. Facilitate service capacity planning and demand forecasting, software performance analysis, and system tuning. Work with SiteReliability Engineering (SRE) team on the shared full stack ownership of a collection of services and/or technology areas. Understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of … with two or more areas: Terraform, Infrastructure as Code Oracle Database Linux RMAN Exadata Zero Data Loss Recovery Appliance Additionally we are looking for motivated individuals that have Prior SRE experience managing production cloud services Prior experience in releasing and maintaining cloud services Excellent verbal and written communication Production experience managing systems or database environment Experience with a general-purpose … for at least three calendar days from the posting date or as long as the job remains posted. Required Skills Automation Cloud Infrastructure Services DevOps SQL (Structured Query Language) SRE Troubleshoot Issues About Us As a world leader in cloud solutions, Oracle uses tomorrow's technology to tackle today's challenges. We've partnered with industry-leaders in almost every More ❯
We unleash the potential of organisations through the science of board effectiveness, building better businesses and benefiting society. The Opportunity As a Senior SiteReliabilityEngineer (SRE), you'll be joining a team whose mission is to ensure the availability, performance, security and reliability of our platform and core services, ensuring that they meet the needs … be responsible for visibility and monitoring of those systems, for building tooling and automation to reduce TOIL and for responding to incidents as part of our 24/7 SRE on-call team. The SRE team: Strives to provide the highest standards of Availability, Scalability, Performance and Security for our Software as a Service environments across multiple cloud vendors and … work Proactively monitors our platform and responds to incidents as part of a 24/7 rota Key responsibilities of the role We're looking for a great Senior SRE to be a hands on individual contributor to key technical projects and to help us build a first-class SRE function. This role will involve: Hands on work with technical More ❯
Location: Remote Employment Type: 6 month Contract Rate: £550 per day, Outside of IR35 Role Overview Morgan Hunt are seeking an experienced SiteReliabilityEngineer (SRE)/Unix Infrastructure Engineer to support the deployment, migration, and optimisation of critical infrastructure services. The role involves ensuring high availability, disaster recovery readiness, and automation-driven improvements across RHEL More ❯
travel to Scotland Employment Type: 6 month Contract Rate: £550 per day, Outside of IR35 Role Overview Morgan Hunt are seeking an experienced SiteReliabilityEngineer (SRE)/Unix Infrastructure Engineer to support the deployment, migration, and optimisation of critical infrastructure services. The role involves ensuring high availability, disaster recovery readiness, and automation-driven improvements across More ❯
our customer's systems are built and maintained. This role blends operational product support with software engineering to create applications to understand the overall health of our systems. The SRE team sits within a wider programme at the core of the customer mission. The role holder: As an SRE, fundamentally you will be doing work that has historically been done … engineering expertise to substitute automation for human labour, with the objective of limiting traditional manual operations work (incident tickets, on-call etc.) to no more than half of the SRE team's time (and aiming for considerably less). You will have an enthusiasm to learn and experiment, to develop tools to understand application health and improve their reliability … enable them to be scalable and resilient to failure, and how to get the best out of the infrastructure they are deployed to. Participating in the wider DevOps/SRE community within the organisation. Competancies It is desirable for you to have experience in the areas below. However more valued for this role is that you have excitement and enthusiasm More ❯
Hybrid position with on-calls We are seeking a highly motivated and skilled SiteReliabilityEngineer (SRE) to ensure the reliability, performance, and scalability of the client's critical Data Platform solutions. In this role, you will provide dedicated support and maintain the health of the data infrastructure. This position involves on-call responsibilities to address More ❯
# SiteReliability EngineerRemote - APAC/EngineeringThe Tyk API Management platform is helping to drive the connected world and power new products and services. We're changing the way that organisations connect any number of their systems and services.Whether internal, external, public or highly encrypted systems, Tyk helps businesses drive value across the retail, finance, telecoms, healthcare, or … radical responsibility If this sounds like an environment that you believe could work for you then read on to find out more. The role: We're looking for a SiteReliabilityEngineer to manage, maintain, improve and provide support on our platform. You will be curious by nature, always looking for ways to improve, as we will … we expect this role to be advocate of continuous improvement Reliability of our new global Tyk Cloud platform Automation of operations and support Writing and maintaining documentation on SRE processes and policies Recommending and implementing ways of driving operational efficiency and driving down our cost to run, without impacting service Assisting in penetration testing for Cloud through liaising with More ❯
company and role A leading public sector organisation is undergoing a major shift from on-premise systems to cloud-based services. As a SiteReliabilityEngineer (SRE), you'll join a collaborative, agile team focused on enhancing platform resilience, automation, and observability.You'll work across a modern tech stack, including RHEL, Ansible, Oracle, AWS, and container platforms More ❯
company and role A leading public sector organisation is undergoing a major shift from on-premise systems to cloud-based services. As a SiteReliabilityEngineer (SRE), you'll join a collaborative, agile team focused on enhancing platform resilience, automation, and observability. You'll work across a modern tech stack, including RHEL, Ansible, Oracle, AWS, and container More ❯
ReliabilityEngineer - Public Sector - Outside IR35 - Edinburgh (Hybrid) Day Rate - up to £560 (outside IR35) Duration - 6 months Harvey Nash's Client are hiring an experienced SRE, to support and enhance an existing digital platform. Responsibilities Support deployment and migration for services to RHEL8/9 Develop and strengthen automation to support disaster recovery activities Support for More ❯
Founded in 2001, Resident Advisor (RA) is one of the world's longest-running music media brands and a cornerstone of the dance, electronic and DJ ecosystem. The site's audience of over 6 million monthly users is drawn in by a combination of news, editorial, club listings and ticketing, RA-branded events at venues and festivals worldwide, original … films and a weekly mix series that has run for 18 years. We're looking for a Senior SiteReliabilityEngineer passionate about electronic music to join our Core Platform team. This role is office based (minimum 3 days/week in-office), and offers flexibility to work hybridly. You'll help scale our high-traffic infrastructure … MSSQL databases, ElasticSearch, Redis, and Kafka running on AWS EKS (Kubernetes), managed via Terraform with CI/CD pipelines and DataDog monitoring. Your responsibilities include improving infrastructure performance and reliability, driving modernization and cost optimization, developing shared components (i.e. auth systems, GraphQL gateways), enhancing developer experience, maintaining E2E testing systems, and creating internal tooling. This is an opportunity to More ❯
In Order to Join the ELEVI Team you will need Position: SiteReliabilityEngineer (SRE) - System Administrator, Mid Clearance: Clearable You Have: 4+ years of experience working with AWS infrastructure and platforms including Infrastructure as Code 4+ years of experience working with and/or administering Linux environments Experience delivering software to clients using Agile methodologies, including More ❯
Job Description Would you like to be an Engineer that builds the Cloud, rather than just uses it? At AWS, our Engineers manage the behind-the-scenes software and tools that support the world's largest cloud computing infrastructure. We … offer an exciting opportunity to join a world-class network team in a dynamic environment that feels like a start-up. As a SiteReliabilityEngineer (SRE) , you will deploy, manage, troubleshoot, and innovate the tools, services, and components that enable our network engineers to automate and maintain network operations. Your internal customers are your network engineering More ❯
Columbia, Missouri, United States Hybrid / WFH Options
Centene
organization, Centene's technology professionals have access to competitive benefits including a fresh perspective on workplace flexibility. Position Purpose: We are seeking a highly skilled and experienced M365 Lead SiteReliabilityEngineer to join our team. The ideal candidate will be responsible for developing and creating monitoring and observability dashboards within Splunk, Dynatrace, and other monitoring and … alerting platforms. This role requires advanced proficiency in PowerShell scripting and Graph APIs, as well as intermediate proficiency in Power Apps/Automate. This role will ensure the reliability, performance, and scalability of our Microsoft 365 environment. Leads team to identify problems with systems and services and drives regular deployment of new versions of the systems and their subcomponents … visibility. Drives decisions around periodic system validation and testing, service monitoring, and standing up new services/tools Uses advanced knowledge and experience to identify strategies that increase system reliability and performance through on-call rotation and process optimization Leads post incident reviews and documents findings for future informed decision making Drives implementation of approved proposals to optimize Software More ❯
Florissant, Missouri, United States Hybrid / WFH Options
Centene
organization, Centene's technology professionals have access to competitive benefits including a fresh perspective on workplace flexibility. Position Purpose: We are seeking a highly skilled and experienced M365 Lead SiteReliabilityEngineer to join our team. The ideal candidate will be responsible for developing and creating monitoring and observability dashboards within Splunk, Dynatrace, and other monitoring and … alerting platforms. This role requires advanced proficiency in PowerShell scripting and Graph APIs, as well as intermediate proficiency in Power Apps/Automate. This role will ensure the reliability, performance, and scalability of our Microsoft 365 environment. Leads team to identify problems with systems and services and drives regular deployment of new versions of the systems and their subcomponents … visibility. Drives decisions around periodic system validation and testing, service monitoring, and standing up new services/tools Uses advanced knowledge and experience to identify strategies that increase system reliability and performance through on-call rotation and process optimization Leads post incident reviews and documents findings for future informed decision making Drives implementation of approved proposals to optimize Software More ❯
Jefferson City, Missouri, United States Hybrid / WFH Options
Centene
organization, Centene's technology professionals have access to competitive benefits including a fresh perspective on workplace flexibility. Position Purpose: We are seeking a highly skilled and experienced M365 Lead SiteReliabilityEngineer to join our team. The ideal candidate will be responsible for developing and creating monitoring and observability dashboards within Splunk, Dynatrace, and other monitoring and … alerting platforms. This role requires advanced proficiency in PowerShell scripting and Graph APIs, as well as intermediate proficiency in Power Apps/Automate. This role will ensure the reliability, performance, and scalability of our Microsoft 365 environment. Leads team to identify problems with systems and services and drives regular deployment of new versions of the systems and their subcomponents … visibility. Drives decisions around periodic system validation and testing, service monitoring, and standing up new services/tools Uses advanced knowledge and experience to identify strategies that increase system reliability and performance through on-call rotation and process optimization Leads post incident reviews and documents findings for future informed decision making Drives implementation of approved proposals to optimize Software More ❯
Saint Louis, Missouri, United States Hybrid / WFH Options
Centene
organization, Centene's technology professionals have access to competitive benefits including a fresh perspective on workplace flexibility. Position Purpose: We are seeking a highly skilled and experienced M365 Lead SiteReliabilityEngineer to join our team. The ideal candidate will be responsible for developing and creating monitoring and observability dashboards within Splunk, Dynatrace, and other monitoring and … alerting platforms. This role requires advanced proficiency in PowerShell scripting and Graph APIs, as well as intermediate proficiency in Power Apps/Automate. This role will ensure the reliability, performance, and scalability of our Microsoft 365 environment. Leads team to identify problems with systems and services and drives regular deployment of new versions of the systems and their subcomponents … visibility. Drives decisions around periodic system validation and testing, service monitoring, and standing up new services/tools Uses advanced knowledge and experience to identify strategies that increase system reliability and performance through on-call rotation and process optimization Leads post incident reviews and documents findings for future informed decision making Drives implementation of approved proposals to optimize Software More ❯
Kansas City, Missouri, United States Hybrid / WFH Options
Centene
organization, Centene's technology professionals have access to competitive benefits including a fresh perspective on workplace flexibility. Position Purpose: We are seeking a highly skilled and experienced M365 Lead SiteReliabilityEngineer to join our team. The ideal candidate will be responsible for developing and creating monitoring and observability dashboards within Splunk, Dynatrace, and other monitoring and … alerting platforms. This role requires advanced proficiency in PowerShell scripting and Graph APIs, as well as intermediate proficiency in Power Apps/Automate. This role will ensure the reliability, performance, and scalability of our Microsoft 365 environment. Leads team to identify problems with systems and services and drives regular deployment of new versions of the systems and their subcomponents … visibility. Drives decisions around periodic system validation and testing, service monitoring, and standing up new services/tools Uses advanced knowledge and experience to identify strategies that increase system reliability and performance through on-call rotation and process optimization Leads post incident reviews and documents findings for future informed decision making Drives implementation of approved proposals to optimize Software More ❯
St. Louis, Missouri, United States Hybrid / WFH Options
Centene
organization, Centene's technology professionals have access to competitive benefits including a fresh perspective on workplace flexibility. Position Purpose: We are seeking a highly skilled and experienced M365 Lead SiteReliabilityEngineer to join our team. The ideal candidate will be responsible for developing and creating monitoring and observability dashboards within Splunk, Dynatrace, and other monitoring and … alerting platforms. This role requires advanced proficiency in PowerShell scripting and Graph APIs, as well as intermediate proficiency in Power Apps/Automate. This role will ensure the reliability, performance, and scalability of our Microsoft 365 environment. Leads team to identify problems with systems and services and drives regular deployment of new versions of the systems and their subcomponents … visibility. Drives decisions around periodic system validation and testing, service monitoring, and standing up new services/tools Uses advanced knowledge and experience to identify strategies that increase system reliability and performance through on-call rotation and process optimization Leads post incident reviews and documents findings for future informed decision making Drives implementation of approved proposals to optimize Software More ❯