Permanent Site Reliability Engineer Job Vacancies

126 to 150 of 172 Permanent Site Reliability Engineer Jobs

Site Reliability Engineer

London, United Kingdom
C3 AI
cost-effectively than alternative approaches. The C3 AI Platform supports the value chain in any industry with prebuilt, configurable, high-value AI applications for reliability, fraud detection, sensor network health, supply network optimization, energy management, anti-money laundering, and customer engagement. Learn more at: C3 AI We are looking … for a Site Reliability Engineer to join our team in London. Responsibilities: Maximize system uptime and availability, ensuring functional and performance SLAs. Establish end-to-end monitoring and alerting on all critical aspects. Solve complex problems for critical services and build automation to prevent problem recurrence. Influence More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Site Reliability Engineer

United Kingdom
PEXA Group
aspect of your life - we want to help you create your ideal work/life blend, rather than squeezing in life around work. The Site Reliability Engineer is responsible for the technical support and operation of UK Platforms (both from an application and infrastructure perspective) by actively … and Infrastructure patching, DR testing, creation of alerting and monitoring and service transition activities - knowledge transfer, operation playbook updates/knowledge articles update. The SRE will closely collaborate with the customer support team and the product development squads in various global locations to achieve the best outcome for the technical … the technical support function, is the contact point for technical incidents as well as for the support teams. Key Accountabilities Ensure high availability and reliability of UK platforms with day-to-day support. Manage incidents with rapid resolution, root cause analysis, and post-mortems to prevent recurrence. Optimise monitoring More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Site Reliability Engineer, Simple Storage and Glacier team ( S3G)

London, United Kingdom
ENGINEERINGUK
You will need to login before you can apply for a job. Site Reliability Engineer, Simple Storage and Glacier team (S3G) Sector: Engineering Role: Professional Contract Type: Permanent Hours: Full Time DESCRIPTION Managing trillions of objects in storage, retrieving them in sub-x ms, building software that … find every day working in Simple Storage Service (S3) and Glacier. The Region Services S3 and Glacier Engineering team are looking for a talented engineer who is motivated to solve complex challenges, yet are not constrained by "how things are usually done" and are willing to decompose and reinvent … standard of quality across all team deliverables. BASIC QUALIFICATIONS Knowledge of systems engineering fundamentals (networking, storage, operating systems) Experience designing or architecting (design patterns, reliability and scaling) of new and existing systems Experience in networking, storage systems, operating systems and hands-on systems engineering Experience programming with at least More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Client Site Reliability Engineer

London, United Kingdom
Hybrid / WFH Options
Molten Ventures plc
the world's most innovative fintechs, and the Financial Times recognised us as one of Europe's fastest-growing companies in 2023. The Client Site Reliability Engineer role in Infrastructure, Client Services will be responsible for enabling and supporting our clients to deliver a best in class … at scale. This role supports clients in their cloud infrastructure preparation, deployment, optimisation and troubleshooting. Duties Hands on cloud infrastructure consulting both on client site and remote Working with customers and external partners to design and prepare suitable cloud infrastructure to ensure Thought Machine Vault products can be tested … empower holistic digital transformation in collaboration with Thought Machine Client Architects Supporting and troubleshooting client, SaaS and internal cloud infrastructure both remotely and on site, including by promoting and deploying suitable monitoring, logging and alerting tools Working closely with internal product and engineering teams to ensure client feedback is More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Site Reliability Engineer (GCP)

Chicago, Illinois, United States
Intone Networks
KFORCE URGENT REQUIREMENT Looking for candidates regarding the following: POSITION Site Reliability Engineer (GCP) LOCATION Midwest area, travel … once a month. Headquarters is in Chicago IL so someone local is preferred DURATION 3+ months INTERVIEW TYPE Video VISA RESTRICTIONS None REQUIRED SKILLS SRE infrastructure GCP platform Dataflow, composer, BigQuery, cloud function, pubsub Talend, Collibra More ❯
Employment Type: Any
Salary: USD Annual
Posted:

Site Reliability Engineer (f/m/d)

Germany
alfaview gmbh
We are looking for a highly motivated Site Reliability Engineer (f/m/d) who joins our team in the development and maintenance of a next generation cross-platform video conferencing software. alfaview gmbh is a part of the alfa group. As a pioneer in the More ❯
Employment Type: Permanent
Salary: EUR Annual
Posted:

Site Reliability Engineer

London, United Kingdom
ION Group
do your best work. Learn more at . We are looking for experienced people who are competent in the cloud and knowledgeable about the SRE (site reliability engineering) domain. The team The Core Architecture Team (CAT) produces and manages the core technology, methodologies, and frameworks that underpin all More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Lead Site Reliability Engineer

London, United Kingdom
Bumble
which pronouns you use (For example: she/her, he/him, they/them, etc). At Bumble, Site Reliability Engineers (SRE) are responsible for ensuring the reliability, scalability and performance of software systems while bridging the gap between development, security and operations. We proactively manage … and fixing issues Respond to system outages, troubleshooting root causes and implementing preventative measures Collaborate with engineering teams and security engineers to improve system reliability, security and performance Participate in on-call rotations Create and maintain documentation to improve knowledge sharing across teams About you Excellent problem solving, analytical … must Proficiency in at least Python or Golang programming languages Experience with CI/CD pipelines Strong proficiency with Kubernetes architecture Prior experience in SRE, System administration or DevOps roles Strong proficiency with Linux/Unix operating systems, including hands-on experience in configuration and troubleshooting Proficiency with using Puppet More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Site Reliability Engineer

London, United Kingdom
Apple Inc
Shazam Site Reliability Engineers are not just responsible for making sure all services and systems that Shazam relies on are operating at their highest level; they're also responsible for helping development teams embrace these principles … as they develop software. Shazam SREs embed themselves with development teams and act as extensions of those teams to propagate best practices. As an SRE, you'll collaborate with development teams to help them understand the bigger picture of distributed systems, beyond individual components. We are strong believers in ownership … with software engineers being responsible for the code they write. The SRE team helps build the competencies across teams to ensure we build scalable and supportable systems. This role sits in our London office reporting to our Head of SRE. The successful candidate will be assisting multiple development teams based More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Site Reliability Engineering Manager (SRE) , Analytics

United Kingdom
Apple Inc
Site Reliability Engineering Manager (SRE), Analytics The Apple Services Engineering team (ASE) is one of the most exciting examples of Apple's long-held passion for combining art and technology. These are the people who power the App Store, Apple TV, Apple Music, Apple Podcasts, and Apple Books. … ever before, these teams remain small and multi-functional, offering greater exposure to the array of opportunities here. Description The Service Reliability Engineering (SRE) Manager role in Apple Services Engineering requires a mix of strategic engineering and design along with hands-on technical work. This SRE will configure, tune … millions of users, then this is the place for you! Minimum Qualifications Experience with hiring and leading engineers Demonstrable success leading engineering teams - ideally SRE or Production Engineering Experience with large scale distributed systems Deep understanding and experience in one or more of the following: Hadoop, Spark, Flink, Kubernetes, AWS More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Site Reliability Engineer with Security Clearance

Washington, Washington DC, United States
ValidaTek
ValidaTek is building teams of Site Reliability Engineers (SRE's) to support internal and external engineering and operations of a large scale and world-wide Enterprise IT environment that covers application hosting and support, enterprise services, and infrastructure services. The role of SRE is a highly technical role … and supported in pre-production and production Implement system and application monitoring for custom requirements and application uptime with the intent of maximizing platform reliability Troubleshoot and analyze system issues, delving into hardware, networks, application, and storage/DB layers as needed Participate in lifecycle management lifecycle management of More ❯
Employment Type: Permanent
Salary: USD Annual
Posted:

Sr. Lead Software Engineer, DevOps/SRE (Python, SRE) (Cloud Operations Resilience Engineering)

Richmond, Virginia, United States
Capital One
Sr. Lead Software Engineer, DevOps/SRE (Python, SRE) (Cloud Operations Resilience Engineering)Do you love building and pioneering in the technology space? Do you enjoy solving complex business problems in a fast-paced, collaborative, inclusive, and iterative delivery environment? At Capital One, you'll be part of a … real customer needs. We are seeking DevOps Engineers who are passionate about marrying data with emerging technologies to join our team. As a DevOps Engineer, you'll have the opportunity to be on the forefront of driving a major transformation within Capital One. Who We Are: The Cloud Operations … years of DevOps Engineering experience 6+ years of experience with coding and scripting using Python 5+ years of experience Site Reliability Engineering (SRE) 4+ years of experience in infrastructure design, implementation and delivery 3+ years of experience with monitoring tools (Splunk or Zabbix) 3+ years of experience with More ❯
Employment Type: Permanent
Salary: USD Annual
Posted:

Sr. Lead Software Engineer, DevOps/SRE (Python, SRE) (Cloud Operations Resilience Engineering)

Plano, Texas, United States
Capital One
Sr. Lead Software Engineer, DevOps/SRE (Python, SRE) (Cloud Operations Resilience Engineering)Do you love building and pioneering in the technology space? Do you enjoy solving complex business problems in a fast-paced, collaborative, inclusive, and iterative delivery environment? At Capital One, you'll be part of a … real customer needs. We are seeking DevOps Engineers who are passionate about marrying data with emerging technologies to join our team. As a DevOps Engineer, you'll have the opportunity to be on the forefront of driving a major transformation within Capital One. Who We Are: The Cloud Operations … years of DevOps Engineering experience 6+ years of experience with coding and scripting using Python 5+ years of experience Site Reliability Engineering (SRE) 4+ years of experience in infrastructure design, implementation and delivery 3+ years of experience with monitoring tools (Splunk or Zabbix) 3+ years of experience with More ❯
Employment Type: Permanent
Salary: USD Annual
Posted:

Sr. Lead Software Engineer, DevOps/SRE (Python, SRE) (Cloud Operations Resilience Engineering)

Mc Lean, Virginia, United States
Capital One
Sr. Lead Software Engineer, DevOps/SRE (Python, SRE) (Cloud Operations Resilience Engineering)Do you love building and pioneering in the technology space? Do you enjoy solving complex business problems in a fast-paced, collaborative, inclusive, and iterative delivery environment? At Capital One, you'll be part of a … real customer needs. We are seeking DevOps Engineers who are passionate about marrying data with emerging technologies to join our team. As a DevOps Engineer, you'll have the opportunity to be on the forefront of driving a major transformation within Capital One. Who We Are: The Cloud Operations … years of DevOps Engineering experience 6+ years of experience with coding and scripting using Python 5+ years of experience Site Reliability Engineering (SRE) 4+ years of experience in infrastructure design, implementation and delivery 3+ years of experience with monitoring tools (Splunk or Zabbix) 3+ years of experience with More ❯
Employment Type: Permanent
Salary: USD Annual
Posted:

Site Reliability Engineering Manager

Madrid, Spain
Onum
analytics costs by reducing data, avoiding vendor lock-in, and aligning the value of each dataset with actions taken. About the Role As an SRE Manager, you will be responsible for leading and managing a team of Site Reliability Engineers while staying actively involved in day-to-day … our platform remains highly reliable, scalable, and efficient. You will work closely with software engineers and DevOps teams to identify opportunities to improve infrastructure reliability and automation. Responsibilities Team Leadership & Development: Manage and mentor a small team of SREs, helping them to grow their skills through coaching … feedback, and development plans. Foster a collaborative team environment where knowledge sharing, continuous learning, and innovation are encouraged. Assist in recruiting and onboarding new SRE team members, ensuring they are set up for success. Conduct regular one-on-ones with team members, set clear performance goals, and provide ongoing support. More ❯
Employment Type: Permanent
Salary: EUR Annual
Posted:

Staff Site Reliability Engineer, PaaS Paris, France

London, United Kingdom
Hybrid / WFH Options
Tbwa Chiat/Day Inc
empowering development teams by creating toolchains, guidelines, and standards. Our focus is on enabling seamless automation and CI/CD, comprehensive observability, and unwavering reliability in a secured cloud-native environment. The Opportunity The Staff Engineer position within the Platform As a Service team offers a compelling opportunity … utilisation, enhancing fault tolerance, and ensuring the platform's ability to meet evolving demands efficiently and effectively. You provide guidance and mentorship to other SRE team members, helping them to develop their skills and knowledge of best practices in site reliability engineering. You establish and enforce engineering processes … organization. You collaborate with senior leadership to shape the vision and direction of the company (cloud) infrastructures, and you help drive the development of SRE-specific strategies and initiatives that align with business objectives. You build and maintain strong relationships with stakeholders across the organization, and you represent the SRE More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Site Reliability Engineer

London, United Kingdom
Hybrid / WFH Options
Promote Project
databases and we want to grow that number, along with delivering more features without compromising from reliability and scalability. This is where our SRE team comes into the picture. The SRE team is responsible for managing Neon's multi-region, multi-cloud deployment in close collaboration with the broader … engineering team. All the features we want to implement can only reach our customers if the changes are delivered reliably, which means the SRE team plays a significant role in defining our pace of development. Successful candidates will get the opportunity to contribute to the effort of evolving Neon to … cloud and infrastructure topics Be ready to join an on-call rotation We're looking for someone who has 4+ years experience working in Site Reliability Engineering Experience with cloud infrastructure components in Azure and/or AWS Experience in a complex Linux infrastructure environment Experience focusing on More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Senior Site Reliability Engineer II

London, United Kingdom
Out in Science, Technology, Engineering, and Mathematics
Your Impact As a contributor in the APX SRE organization, you are passionate about delivering solutions to the real-time problems our mission-critical cloud native services encounter. You are also obsessed about achieving the high quality and reliability our customers demand. You will work closely not only with … the APX SRE organization, but your technical deliverables will reach the entire engineering organization to enable product teams to continuously deliver features on the vanguard of innovation. What You'll Do Location: London, England. Build robust, easy-to-use foundational platforms and tools that enable engineering teams to provision services … rapidly, consistently, and securely. Exemplify cloud-native site reliability best practices. Write code that is performant, maintainable, clear, and concise. Employ strong problem-solving skills, with the ability to debug problems in cloud-native distributed systems. Influence and educate the engineering organization to adopt new and improved architectural More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Site Reliability Engineer II - CTJ - Poly with Security Clearance

Reston, Virginia, United States
Microsoft Corporation
services to provide world-class user experiences on Azure. Work together with the team to ensure service quality, availability, and reliability. Participate in live-site , security reviews, and analysis to partner teams to ensure our services are secure and reliable. Engineers should expect to participate in a regularly scheduled … languages (like Java or C#) and related frameworks. Experience with cloud infrastructures like Azure or AWS and open-source tools and frameworks are preferred. Site Reliability Engineering IC3 - The typical base pay range for this role across the U.S. is USD $98,300 - $193,200 per year. There More ❯
Employment Type: Permanent
Salary: USD 193,200 Annual
Posted:

DevOps Engineer / Site Reliability Engineer

London, United Kingdom
Devopshunt
Type: Full-time Location Type: On-site Location: London, England, United Kingdom Salary: Not disclosed Description As a critical and trusted member of the Systems Engineering team, you'll be working side-by-side with software engineers to design and deliver mission critical services and systems. You'll be More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Site Reliability Engineering Manager

London Area, United Kingdom
Signify Technology
The SRE Manager is responsible for leading the Site Reliability Engineering function across Europe, ensuring the reliability, scalability, and performance of critical infrastructure and services. This role plays a key part in the global follow-the-sun support model, working closely with the Global SRE Leader to … impact team. You'll collaborate with Engineering, Infrastructure, and Operations teams to maintain high availability and resilient service delivery, while also mentoring a regional SRE team focused on continuous improvement and innovation. Key Responsibilities: Technical Leadership Develop deep expertise in the Titanium trading platform to lead and support critical business … ensuring priorities align with business goals and resource capacity. Operational Excellence Champion initiatives that enhance system availability, scalability, and performance. Collaborate with the Global SRE Leader to refine and enforce operational policies (e.g., Capacity Planning, Change Management, Disaster Recovery). Cross-Functional Collaboration Partner with Software Engineering, Infrastructure, Operations, Security More ❯
Posted:

Site Reliability Engineering Manager

london, south east england, united kingdom
Signify Technology
The SRE Manager is responsible for leading the Site Reliability Engineering function across Europe, ensuring the reliability, scalability, and performance of critical infrastructure and services. This role plays a key part in the global follow-the-sun support model, working closely with the Global SRE Leader to … impact team. You'll collaborate with Engineering, Infrastructure, and Operations teams to maintain high availability and resilient service delivery, while also mentoring a regional SRE team focused on continuous improvement and innovation. Key Responsibilities: Technical Leadership Develop deep expertise in the Titanium trading platform to lead and support critical business … ensuring priorities align with business goals and resource capacity. Operational Excellence Champion initiatives that enhance system availability, scalability, and performance. Collaborate with the Global SRE Leader to refine and enforce operational policies (e.g., Capacity Planning, Change Management, Disaster Recovery). Cross-Functional Collaboration Partner with Software Engineering, Infrastructure, Operations, Security More ❯
Posted:

Senior Software Engineer, SRE, Cloud Incident Response

United Kingdom
WeAreTechWomen
years of experience in designing, analyzing, and troubleshooting distributed systems, and 2 years of experience leading projects and providing technical leadership. Experience in SRE or incident management/response environments. Preferred qualifications: Experience working in computing, distributed systems, storage, or networking. Experience in telemetry systems, incident and risk management. Expertise … code, and to automate routine tasks. Excellent problem-solving approach, with verbal and written communication skills. About the job Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that Google Cloud's services-both our … our systems capacity and performance. Much of our software development focuses on optimizing existing systems, building infrastructure and eliminating work through automation. On the SRE team, you'll have the opportunity to manage the complex challenges of scale which are unique to Google Cloud, while using your expertise in coding More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Site Reliability Engineer, Compute

United Kingdom
Tbwa Chiat/Day Inc
grow our small team into a global footprint that can provide expert engagement across our core serving systems. As an early member of the SRE team, you will report directly to the Director of Managed Infrastructure and play a foundational role in expanding our SRE practice, integrating reliability principles … through the development of automated systems for software delivery, system failover, and capacity management. About You: At least 3 years of experience in an SRE role, or at least 5 years of experience in an adjacent role (e.g., platform engineering), operating in a scaled environment. Firm grasp of the SRE … philosophy and mindset, with practical experience working on or directly with SRE teams that have proactively engaged in system design and improvement. Strong sense of accountability and commitment to problem-solving, backed by a curiosity to dig deep and identify root causes. Willingness to proactively engage with development teams to More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:

Site Reliability Engineer

Southampton, Hampshire, United Kingdom
Hybrid / WFH Options
NICE
production environment by monitoring availability and taking a holistic view of system health Build software and systems to manage platform infrastructure and applications Improve reliability, quality, and time-to-market of our suite of software solutions Measure and optimize system performance, with an eye toward pushing our capabilities forward … Participate in system design consulting, platform management, and capacity planning Create sustainable systems and services through automation and uplifts Balance feature development speed and reliability with well-defined service level objectives Have you got what it takes? 3-6 years of working experience in a similar role, with a … Python, Go, Java, C#) and experience with scripting languages (e.g., Bash, PowerShell). Deep understanding of cloud computing platforms (e.g., AWS), the working and reliability constraints of some of the prominent services (e.g., EC2, ECS, Lambda, DynamoDB etc) Experience with infrastructure as code tools such as CloudFormation, Terraform. Deep More ❯
Employment Type: Permanent
Salary: GBP Annual
Posted:
Site Reliability Engineer
10th Percentile
£57,500
25th Percentile
£63,750
Median
£71,250
75th Percentile
£91,000
90th Percentile
£116,250