Washington, Washington DC, United States Hybrid / WFH Options
ClearanceJobs
Remote - SiteReliabilityEngineer (SRE) ClearanceJobs is aiding their partner, headquartered in New York City and widely recognized as the industry leader in CPS protection, in their search for a skilled SiteReliabilityEngineer (SRE). The selected candidate will support and maintain our customers' FedRAMP- compliant deployment in AWS GovCloud for public sector … customers. The SRE will be responsible for ensuring high availability, security, and compliance of cloud-based environments while driving automation, monitoring, and incident response best practices. U.S. Citizenship (required for working in GovCloud environments) Terms: Fulltime/Direct Hire Location: Remote (DMV area) Salary: $200k - $260k (will fluctuate pending experience) Qualifications: • 6-8+ years of experience in SRE, DevOps … and scripting (Python, Bash). • Experience with logging, monitoring, and observability tools in a cloud-native environment. • Strong troubleshooting, problem-solving, and automation mindset. Responsibilities/Impact as a SRE: • AWS GovCloud Operations: Manage and optimize cloud-based infrastructure in AWS GovCloud, ensuring FedRAMP compliance and high availability. • Reliability & Performance: Monitor and enhance system performance, scalability, and reliabilityMore ❯
SiteReliabilityEngineer - Microsoft Admin (Windows Server, IIS, MS SQL Server) Team Summary The ReliabilityEngineer (SRE) is a member of a cross-functional Operations & Infrastructure team responsible for running our Visa Spend Clarity for Enterprises production infrastructure and ensuring the highest levels of availability, performance, and operational excellence. What a SiteReliabilityEngineer does at Visa: The SRE is responsible for finding the right way to run robust applications in our environments. In this role, you will balance engineering improvements, systems operations, and contributions to strategic initiatives. You will work closely with all members of the Technology Group to improve the reliability, availability, performance, monitoring, and operations of Visa More ❯
A Global Government Contracting Company is seeking a SiteReliabilityEngineer to join their team in Sunnyvale, CA! As a SiteReliabilityEngineer, you will: Design, implement, and maintain highly available and scalable systems and infrastructure to support classified applications and services Develop and implement reliability-focused engineering practices, such as continuous integration … continuous deployment, and continuous monitoring, while ensuring compliance with classified system requirements Collaborate with development teams to ensure that reliability and scalability are considered throughout the software development lifecycle, while maintaining the security and integrity of the classified system Identify and mitigate potential sources of downtime and performance degradation, including infrastructure, application, and network issues, while ensuring that all … corrective actions, while ensuring that all incident response activities are conducted in accordance with classified system procedures Collaborate with other teams, including development, operations, and security, to ensure that reliability and scalability are considered in all aspects of system design and operation, while maintaining the security and integrity of the classified system Develop and maintain metrics and monitoring systems More ❯
Lead SiteReliabilityEngineer (Lead SRE) Ready to keep things running smoothly? Join our tombola team! At tombola, we pride ourselves on building our own exceptional games and platforms in-house. That means keeping everything running flawlessly is paramount! We're seeking a Lead SiteReliabilityEngineer (SRE) to join us and help ensure … our critical systems and services are always reliable, available, and performing at their best. What will yo u be doing? As an SRE, you'll be instrumental in implementing automation, monitoring, and incident response strategies to minimize downtime and optimize our operations. You'll collaborate closely with our development, infrastructure, and security teams, balancing exciting new feature delivery with rock … with our broader business objectives. Collaborating with other teams and departments to achieve shared success. Partnering with our People Partner for tech to build robust team management practices. System Reliability and Availability Ensure system uptime: Monitor and maintain the availability and reliability of critical systems and services, meeting all uptime SLAs (Service Level Agreements). Incident management: Quickly More ❯
Bristol, Avon, England, United Kingdom Hybrid / WFH Options
Robert Walters
design, development, and operation of cloud infrastructure and applications on Google Cloud Platform. You will work collaboratively with engineering and infrastructure teams to implement sitereliability engineering (SRE) principles, focusing on system reliability, observability, automation, and operational excellence. This role follows a hybrid working model, requiring attendance at the Bristol office for at least two days per … week or 40% of the working time. Key Responsibilities Promote and embed SRE best practices within engineering teams and microservices environments Partner with infrastructure and DevOps engineers to improve system resilience and performance Troubleshoot complex incidents and implement long-term solutions through code and automation Develop and improve automation pipelines to reduce manual operations and enhance system efficiency Contribute to … multiple strategic digital initiatives and collaborate across engineering domains Essential Skills and Experience Background in software engineering or telemetry, with current focus on SRE Extensive experience with public cloud platforms, particularly Google Cloud (or AWS/Azure) Proven ability to manage Kubernetes clusters in production environments Competence in scripting and development using languages such as Python, Java, Go, Bash, or More ❯
application performance - identifying, and implementing, improvements to application performance and stability. Collaborate with the design and implementation of the desired pipelines and process for deployment to production environment. The SRE will work closely with Platform and Software domains to ensure continuous improvement of performance and stability whilst adhering to standards. Undertake ad-hoc projects and other activities as required. Key … Accountabilities and Activities Contribute to the SRE function including: Drive evolution of the DevOps/GitOps toolchain, promoting improvements to streamline the software delivery process and showing improvements through metrics. Accountable for halting or stopping a project/product if the solution is not technically acceptable. Responsible for producing and maintaining documentation relating to application design, integration processes, testing procedures … to create operational run and playbooks. Integration with Domains including: Collaborating with Domains to plan, design, test and maintain the application. Design patterns for any component or structure under SRE responsibility. Implementation of components such as Monitoring and Logging. Manage the runbook preparations of Domains. Liaise and support other teams on work items including: Developing, refining, and tuning integrations between More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
Rise Technical Recruitment Limited
Senior SiteReliability EngineerLondon - Hybrid£80,000 - £90,000 + 38 Days Holiday + Private Healthcare + Life Assurance + Flexible Working + Pension Excellent opportunity for SiteReliabilityEngineer to join a forward-thinking and high-growth technology company offering a Hybrid work environment, a great benefits, and opportunities for further progression!This company … performance. With a strong culture rooted in integrity, creativity, and technical excellence, they've become a trusted partner across global industries.In this role you'll take ownership of platform reliability, resilience engineering, and incident management across cutting-edge cloud infrastructure. You'll play a key role in ensuring uptime, performance, and continuous improvement of core systems.The ideal candidate will … strategies and conduct chaos engineering experiments*Monitor and maintain Kafka clusters for performance and reliability*Respond to and resolve application-level production incidents The Person: *5+ years in SRE, DevOps, or infrastructure engineering*Strong experience with AWS, EKS/Kubernetes, and Terraform*Familiar with Kafka and observability tools like Datadog or Grafana*Able to troubleshoot issues across infrastructure and More ❯
Role: SiteReliabilityEngineer Client: Defense-Aerospace Hourly Rate: up to $68/hr W2, non-benefited Length … Long-term Location: Scottsdale, AZ Clearance: Department of Defense TS/SCI security clearance is preferred at time of hire. Description: As a SiteReliabilityEngineer (SRE), you will be a member of a cross functional team responsible for maintaining survivability and reliability of mission critical resources. SREs monitor high priority systems and automate recovery mechanisms More ❯
more joyful places to work, as well as learn. About the role We are looking for an enthusiastic and proactive SiteReliabilityEngineer to join our SRE team and help us ensure we provide world-class resilience and performance across the platform. The remit and focus of the role is to advise on all aspects of site … A and backups Conduct assessments of capacity and plan for scaling to meet current and future business needs. Work closely with the Head of Platform Engineering and Head of SRE to strategize and implement scalable solutions. Work closely with the Platform team, feature teams and, 2nd line support and other stakeholders to ensure a good level of service is provided … for our customers and embed SRE practices. Key player in the response and troubleshooting of incidents, ensuring rapid resolution and minimising downtime. Participate in blameless postmortems to identify root cause and corrective actions Develop and maintain playbooks and documentation About you Experience in performance monitoring and analysis Capacity planning experience Scripting and automation skills, with experience in relevant technologies. Experience More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
BOSS Professional Services LTD
SREEngineer Full-time UK - Remote/Hybrid My client is a high growth ecommerce business which runs it technology stack on AWS. Due to the nature of the business the SREEngineer will need to support sudden peaks in traffic smoothly scaling. They also host other ecommerce platform for other brands which also need supporting. As an … SREEngineer you will maintain a scalable and reliable production environment for running software services while helping grow the customer base and product offering. For the SREEngineer role we are seeking: Technology stack: Kubernetes, MySQL, PostgreSQL, PHP, Python, Docker, AWS Lambda, AWS, Redis, ELK, monitoring: Prometheus, Grafana or Loki You have previous experience of working within SRE … Assist and support the DevOps engineers: setting up the infrastructure for microservices Work closely with rest of the DevOps and QA team to load test applications Responsibilities for the SREEngineer include: Create sustainable systems and services through automation and uplifts Partner with development teams to improve services Gather and analyse metrics from both operating systems and applications Participate More ❯
SiteReliabilityEngineer page is loaded SiteReliabilityEngineer Apply locations IND-BLR-Divyasree Technopolis time type Full time posted on Posted Yesterday job requisition id R About LSEG: The London Stock Exchange Group (LSEG) is a global financial markets infrastructure and data provider headquartered in London, UK. Established in 2007, though its core … Exchange-dates to 1801, LSEG plays a meaningful role in capital markets worldwide. It operates several trading venues, including the London Stock Exchange, Borsa Italiana, and Turquoise. Application Support Engineer role that does the ingestion and delivery of subsets of licensed data to each of our client bases using in-house software. This role also involves taking database backup … and creative culture where we encourage new ideas and are committed to sustainability across our global business. You will experience the critical role we have in helping to re-engineer the financial ecosystem to support and drive sustainable economic growth. Together, we are aiming to achieve this growth by accelerating the just transition to net zero, enabling growth of More ❯
SiteReliabilityEngineer (SRE) Manager - Apple Services Engineering London, England, United Kingdom Software and Services Description Apple Service Engineering (ASE)'s Compute team is seeking highly motivated individual with strong technical and communication skills to join us in on our quest to build and enhance massive clusters hosting Virtual Machines, Containers and associated infrastructure that can scale … engage with the upstream community to drive Apple's requirements. Ultimately, you will help build the platform that delivers our applications at scale to our end users.As a Compute SiteReliability Engineering manager, you will be leading a team responsible for providing the platform for mission-critical cloud systems to maintain constant uptime, scale seamlessly, and allow for More ❯
Vacancy for Snr SiteReliabilityEngineer (SRE) at Preservica Abingdon/Remote, UK About You You have a proven track record in DevOps and software development, with a passion for creating reliable solutions to deploy software at scale and speed. You are eager to challenge the status quo, learn, and adopt new technologies. Excellent communication skills across … Our team is small but growing, so self-motivation, organization, and the ability to multitask and prioritize are crucial. The Role Serve as a primary visionary for DevOps/SiteReliability Engineering across the entire technology organization. Eliminate process bottlenecks to enable frictionless, reliable, and high-velocity feature development through automation of Build, Test, Deploy, and Operate processes. More ❯
SiteReliabilityEngineer - Outside IR35-Edinburgh with Remote-6 months Initial Contract-Immediate Start My client are currently working along their journey to move from on-premise project work to a Cloud-based offering (AWS) and require a seasoned SiteReliabilityEngineer to provide hands-on technical skills and to support and improve the More ❯
and transferrable DV Clearance Are you passionate about reliability, automation, and supporting mission-critical systems? Join this global defence organisation as a SiteReliabilityEngineer (SRE) and help shape the future of one of the UK's most vital national security platforms. You'll be joining a growing SRE team at the heart of the customer … s mission, focused on ensuring performance, availability, and scalability-while driving continuous improvement and innovation. About the Role As an SRE, you'll combine your operational expertise with software engineering skills to minimise manual effort and drive automation across complex systems. This role is perfect for someone who thrives on solving hard problems, automating the mundane, and building intelligent tools … overtime. Proactively enhance system availability, performance, and resilience. Develop tools and solutions to automate repetitive tasks and reduce operational toil. Collaborate with development teams to embed best practices and SRE principles. Deploy and manage monitoring systems to provide intelligent observability. Engage with the wider DevOps/SRE community within the organisation. Ideal Skills & Experience We're more interested in your More ❯
and transferrable DV Clearance Are you passionate about reliability, automation, and supporting mission-critical systems? Join this global defence organisation as a SiteReliabilityEngineer (SRE) and help shape the future of one of the UK's most vital national security platforms. You'll be joining a growing SRE team at the heart of the customer … s mission, focused on ensuring performance, availability, and scalability-while driving continuous improvement and innovation. About the Role As an SRE, you'll combine your operational expertise with software engineering skills to minimise manual effort and drive automation across complex systems. This role is perfect for someone who thrives on solving hard problems, automating the mundane, and building intelligent tools … overtime. Proactively enhance system availability, performance, and resilience. Develop tools and solutions to automate repetitive tasks and reduce operational toil. Collaborate with development teams to embed best practices and SRE principles. Deploy and manage monitoring systems to provide intelligent observability. Engage with the wider DevOps/SRE community within the organisation. Ideal Skills & Experience We're more interested in your More ❯
ReliabilityEngineer with a strong focus on leadership and team management . Around 70% of this role is about building, mentoring and directing a high-performing SRE team, setting strategy and driving operational excellence. The remaining 30% will be hands-on involvement in AWS-based platforms, automation and performance tuning. Key Responsibilities Lead and develop a team … of SRE engineers, setting priorities, providing coaching and creating a culture of reliability and continuous improvement Define and own SRE strategy, standards and ways of working across the organisation Collaborate with engineering, operations and product teams to ensure seamless delivery and robust systems Oversee system reliability, availability and performance across large, business-critical platforms Provide technical guidance on … Jenkins, GitLab, Concourse) and ensure AWS platforms meet operational best practice Produce regular reporting and communicate clearly with senior stakeholders Key Requirements Strong experience managing or leading engineering/SRE/DevOps teams in a complex environment Track record of mentoring, coaching and growing technical teams Excellent stakeholder engagement skills with the ability to influence at all levels Broad technical More ❯
West Yorkshire, Yorkshire, United Kingdom Hybrid / WFH Options
VIQU IT Recruitment
ReliabilityEngineer with a strong focus on leadership and team management . Around 70% of this role is about building, mentoring and directing a high-performing SRE team, setting strategy and driving operational excellence. The remaining 30% will be hands-on involvement in AWS-based platforms, automation and performance tuning. Key Responsibilities Lead and develop a team … of SRE engineers, setting priorities, providing coaching and creating a culture of reliability and continuous improvement Define and own SRE strategy, standards and ways of working across the organisation Collaborate with engineering, operations and product teams to ensure seamless delivery and robust systems Oversee system reliability, availability and performance across large, business-critical platforms Provide technical guidance on … Jenkins, GitLab, Concourse) and ensure AWS platforms meet operational best practice Produce regular reporting and communicate clearly with senior stakeholders Key Requirements Strong experience managing or leading engineering/SRE/DevOps teams in a complex environment Track record of mentoring, coaching and growing technical teams Excellent stakeholder engagement skills with the ability to influence at all levels Broad technical More ❯
Morley, Leeds, West Yorkshire, England, United Kingdom Hybrid / WFH Options
VIQU IT Recruitment
ReliabilityEngineer with a strong focus on leadership and team management . Around 70% of this role is about building, mentoring and directing a high-performing SRE team, setting strategy and driving operational excellence. The remaining 30% will be hands-on involvement in AWS-based platforms, automation and performance tuning. Key Responsibilities Lead and develop a team … of SRE engineers, setting priorities, providing coaching and creating a culture of reliability and continuous improvement Define and own SRE strategy, standards and ways of working across the organisation Collaborate with engineering, operations and product teams to ensure seamless delivery and robust systems Oversee system reliability, availability and performance across large, business-critical platforms Provide technical guidance on … Jenkins, GitLab, Concourse) and ensure AWS platforms meet operational best practice Produce regular reporting and communicate clearly with senior stakeholders Key Requirements Strong experience managing or leading engineering/SRE/DevOps teams in a complex environment Track record of mentoring, coaching and growing technical teams Excellent stakeholder engagement skills with the ability to influence at all levels Broad technical More ❯
Operations SiteReliabilityEngineer page is loaded Operations SiteReliabilityEngineer Apply locations United Kingdom-Bristol-Almondsbury-Hempton Court time type Full time posted on Posted 30+ Days Ago job requisition id R022662 Please Note: 1. If you are a first time user, please create your candidatelogin account before you apply for a job. … salary Generous bonus scheme Equity package Competitive company pension Employee stock purchase plan (ESPP) Private Medical Insurance (Individual or family) Life Assurance scheme (up to 4x salary) Ample on-site parking. This role will need to participate in weekends and holidays on-call support as and when required. Broadcom is proud to be an equal opportunity employer. We will More ❯
Manchester, Lancashire, England, United Kingdom Hybrid / WFH Options
Lorien
Junior SiteReliabilityEngineer Hybrid - Manchester x2 days a week Salary up to £45,000 + Bonus The Company: Lorien Global are supporting a growing business based in Manchester City Centre as they expand their Support Services team. With an exciting pipeline of work ahead, they're looking to hire an experienced Junior SiteReliabilityEngineer to play a key role in supporting and improving their online platforms. The Role: You'll act as a technical escalation point, handling complex support queries from the Service Desk and resolving advanced issues across both Windows and Linux environments. From diagnosing system faults to working closely with Infrastructure and Development teams when escalations are required, your More ❯
Leeds, West Yorkshire, United Kingdom Hybrid / WFH Options
Lead SiteReliabilityEngineer Location: Hybrid Morley office with homeworking Package: £70,000 - £80,000 - £6k car allowance, up to 30% bonus, 26 days holiday + flexible benefits A large-scale tech-driven organisation is looking for a Lead SiteReliabilityEngineer with a strong focus on leadership and team management . Around More ❯
travel to Scotland Employment Type: 6 month Contract Rate: £550 per day, Outside of IR35 Role Overview Morgan Hunt are seeking an experienced SiteReliabilityEngineer (SRE)/Unix Infrastructure Engineer to support the deployment, migration, and optimisation of critical infrastructure services. The role involves ensuring high availability, disaster recovery readiness, and automation-driven improvements across More ❯
travel to Scotland Employment Type: 6 month Contract Rate: £550 per day, Outside of IR35 Role Overview Morgan Hunt are seeking an experienced SiteReliabilityEngineer (SRE)/Unix Infrastructure Engineer to support the deployment, migration, and optimisation of critical infrastructure services. The role involves ensuring high availability, disaster recovery readiness, and automation-driven improvements across More ❯
Senior SiteReliabilityEngineer - Featurespace At Visa, we are passionate about making a difference. We lead the way in disrupting fraud from multiple vectors. In this role you will be joining an exciting, innovative business new to the Visa family. At Featurespace, we strive to be the world's best software company at protecting our clients and … problems in new, innovative ways. We are always seeking to be the best at what we do and make our customers smile. The Opportunity In your role as Senior SiteReliabilityEngineer you will help us achieve our goals and deliver success on behalf of our customers by operatingFeaturespace'sworld leadingproduct, ARIC Risk Hub,as arobustcloud-basedSaaS More ❯