cost-effectively than alternative approaches. The C3 AI Platform supports the value chain in any industry with prebuilt, configurable, high-value AI applications for reliability, fraud detection, sensor network health, supply network optimization, energy management, anti-money laundering, and customer engagement. Learn more at: C3 AI We are looking … for a SiteReliabilityEngineer to join our team in London. Responsibilities: Maximize system uptime and availability, ensuring functional and performance SLAs. Establish end-to-end monitoring and alerting on all critical aspects. Solve complex problems for critical services and build automation to prevent problem recurrence. Influence More ❯
aspect of your life - we want to help you create your ideal work/life blend, rather than squeezing in life around work. The SiteReliabilityEngineer is responsible for the technical support and operation of UK Platforms (both from an application and infrastructure perspective) by actively … and Infrastructure patching, DR testing, creation of alerting and monitoring and service transition activities - knowledge transfer, operation playbook updates/knowledge articles update. The SRE will closely collaborate with the customer support team and the product development squads in various global locations to achieve the best outcome for the technical … the technical support function, is the contact point for technical incidents as well as for the support teams. Key Accountabilities Ensure high availability and reliability of UK platforms with day-to-day support. Manage incidents with rapid resolution, root cause analysis, and post-mortems to prevent recurrence. Optimise monitoring More ❯
You will need to login before you can apply for a job. SiteReliabilityEngineer, Simple Storage and Glacier team (S3G) Sector: Engineering Role: Professional Contract Type: Permanent Hours: Full Time DESCRIPTION Managing trillions of objects in storage, retrieving them in sub-x ms, building software that … find every day working in Simple Storage Service (S3) and Glacier. The Region Services S3 and Glacier Engineering team are looking for a talented engineer who is motivated to solve complex challenges, yet are not constrained by "how things are usually done" and are willing to decompose and reinvent … standard of quality across all team deliverables. BASIC QUALIFICATIONS Knowledge of systems engineering fundamentals (networking, storage, operating systems) Experience designing or architecting (design patterns, reliability and scaling) of new and existing systems Experience in networking, storage systems, operating systems and hands-on systems engineering Experience programming with at least More ❯
the world's most innovative fintechs, and the Financial Times recognised us as one of Europe's fastest-growing companies in 2023. The Client SiteReliabilityEngineer role in Infrastructure, Client Services will be responsible for enabling and supporting our clients to deliver a best in class … at scale. This role supports clients in their cloud infrastructure preparation, deployment, optimisation and troubleshooting. Duties Hands on cloud infrastructure consulting both on client site and remote Working with customers and external partners to design and prepare suitable cloud infrastructure to ensure Thought Machine Vault products can be tested … empower holistic digital transformation in collaboration with Thought Machine Client Architects Supporting and troubleshooting client, SaaS and internal cloud infrastructure both remotely and on site, including by promoting and deploying suitable monitoring, logging and alerting tools Working closely with internal product and engineering teams to ensure client feedback is More ❯
KFORCE URGENT REQUIREMENT Looking for candidates regarding the following: POSITION SiteReliabilityEngineer (GCP) LOCATION Midwest area, travel … once a month. Headquarters is in Chicago IL so someone local is preferred DURATION 3+ months INTERVIEW TYPE Video VISA RESTRICTIONS None REQUIRED SKILLS SRE infrastructure GCP platform Dataflow, composer, BigQuery, cloud function, pubsub Talend, Collibra More ❯
We are looking for a highly motivated SiteReliabilityEngineer (f/m/d) who joins our team in the development and maintenance of a next generation cross-platform video conferencing software. alfaview gmbh is a part of the alfa group. As a pioneer in the More ❯
do your best work. Learn more at . We are looking for experienced people who are competent in the cloud and knowledgeable about the SRE (sitereliability engineering) domain. The team The Core Architecture Team (CAT) produces and manages the core technology, methodologies, and frameworks that underpin all More ❯
which pronouns you use (For example: she/her, he/him, they/them, etc). At Bumble, SiteReliability Engineers (SRE) are responsible for ensuring the reliability, scalability and performance of software systems while bridging the gap between development, security and operations. We proactively manage … and fixing issues Respond to system outages, troubleshooting root causes and implementing preventative measures Collaborate with engineering teams and security engineers to improve system reliability, security and performance Participate in on-call rotations Create and maintain documentation to improve knowledge sharing across teams About you Excellent problem solving, analytical … must Proficiency in at least Python or Golang programming languages Experience with CI/CD pipelines Strong proficiency with Kubernetes architecture Prior experience in SRE, System administration or DevOps roles Strong proficiency with Linux/Unix operating systems, including hands-on experience in configuration and troubleshooting Proficiency with using Puppet More ❯
Shazam SiteReliability Engineers are not just responsible for making sure all services and systems that Shazam relies on are operating at their highest level; they're also responsible for helping development teams embrace these principles … as they develop software. Shazam SREs embed themselves with development teams and act as extensions of those teams to propagate best practices. As an SRE, you'll collaborate with development teams to help them understand the bigger picture of distributed systems, beyond individual components. We are strong believers in ownership … with software engineers being responsible for the code they write. The SRE team helps build the competencies across teams to ensure we build scalable and supportable systems. This role sits in our London office reporting to our Head of SRE. The successful candidate will be assisting multiple development teams based More ❯
SiteReliability Engineering Manager (SRE), Analytics The Apple Services Engineering team (ASE) is one of the most exciting examples of Apple's long-held passion for combining art and technology. These are the people who power the App Store, Apple TV, Apple Music, Apple Podcasts, and Apple Books. … ever before, these teams remain small and multi-functional, offering greater exposure to the array of opportunities here. Description The Service Reliability Engineering (SRE) Manager role in Apple Services Engineering requires a mix of strategic engineering and design along with hands-on technical work. This SRE will configure, tune … millions of users, then this is the place for you! Minimum Qualifications Experience with hiring and leading engineers Demonstrable success leading engineering teams - ideally SRE or Production Engineering Experience with large scale distributed systems Deep understanding and experience in one or more of the following: Hadoop, Spark, Flink, Kubernetes, AWS More ❯
ValidaTek is building teams of SiteReliability Engineers (SRE's) to support internal and external engineering and operations of a large scale and world-wide Enterprise IT environment that covers application hosting and support, enterprise services, and infrastructure services. The role of SRE is a highly technical role … and supported in pre-production and production Implement system and application monitoring for custom requirements and application uptime with the intent of maximizing platform reliability Troubleshoot and analyze system issues, delving into hardware, networks, application, and storage/DB layers as needed Participate in lifecycle management lifecycle management of More ❯
Sr. Lead Software Engineer, DevOps/SRE (Python, SRE) (Cloud Operations Resilience Engineering)Do you love building and pioneering in the technology space? Do you enjoy solving complex business problems in a fast-paced, collaborative, inclusive, and iterative delivery environment? At Capital One, you'll be part of a … real customer needs. We are seeking DevOps Engineers who are passionate about marrying data with emerging technologies to join our team. As a DevOps Engineer, you'll have the opportunity to be on the forefront of driving a major transformation within Capital One. Who We Are: The Cloud Operations … years of DevOps Engineering experience 6+ years of experience with coding and scripting using Python 5+ years of experience SiteReliability Engineering (SRE) 4+ years of experience in infrastructure design, implementation and delivery 3+ years of experience with monitoring tools (Splunk or Zabbix) 3+ years of experience with More ❯
Sr. Lead Software Engineer, DevOps/SRE (Python, SRE) (Cloud Operations Resilience Engineering)Do you love building and pioneering in the technology space? Do you enjoy solving complex business problems in a fast-paced, collaborative, inclusive, and iterative delivery environment? At Capital One, you'll be part of a … real customer needs. We are seeking DevOps Engineers who are passionate about marrying data with emerging technologies to join our team. As a DevOps Engineer, you'll have the opportunity to be on the forefront of driving a major transformation within Capital One. Who We Are: The Cloud Operations … years of DevOps Engineering experience 6+ years of experience with coding and scripting using Python 5+ years of experience SiteReliability Engineering (SRE) 4+ years of experience in infrastructure design, implementation and delivery 3+ years of experience with monitoring tools (Splunk or Zabbix) 3+ years of experience with More ❯
Sr. Lead Software Engineer, DevOps/SRE (Python, SRE) (Cloud Operations Resilience Engineering)Do you love building and pioneering in the technology space? Do you enjoy solving complex business problems in a fast-paced, collaborative, inclusive, and iterative delivery environment? At Capital One, you'll be part of a … real customer needs. We are seeking DevOps Engineers who are passionate about marrying data with emerging technologies to join our team. As a DevOps Engineer, you'll have the opportunity to be on the forefront of driving a major transformation within Capital One. Who We Are: The Cloud Operations … years of DevOps Engineering experience 6+ years of experience with coding and scripting using Python 5+ years of experience SiteReliability Engineering (SRE) 4+ years of experience in infrastructure design, implementation and delivery 3+ years of experience with monitoring tools (Splunk or Zabbix) 3+ years of experience with More ❯
analytics costs by reducing data, avoiding vendor lock-in, and aligning the value of each dataset with actions taken. About the Role As an SRE Manager, you will be responsible for leading and managing a team of SiteReliability Engineers while staying actively involved in day-to-day … our platform remains highly reliable, scalable, and efficient. You will work closely with software engineers and DevOps teams to identify opportunities to improve infrastructure reliability and automation. Responsibilities Team Leadership & Development: Manage and mentor a small team of SREs, helping them to grow their skills through coaching … feedback, and development plans. Foster a collaborative team environment where knowledge sharing, continuous learning, and innovation are encouraged. Assist in recruiting and onboarding new SRE team members, ensuring they are set up for success. Conduct regular one-on-ones with team members, set clear performance goals, and provide ongoing support. More ❯
empowering development teams by creating toolchains, guidelines, and standards. Our focus is on enabling seamless automation and CI/CD, comprehensive observability, and unwavering reliability in a secured cloud-native environment. The Opportunity The Staff Engineer position within the Platform As a Service team offers a compelling opportunity … utilisation, enhancing fault tolerance, and ensuring the platform's ability to meet evolving demands efficiently and effectively. You provide guidance and mentorship to other SRE team members, helping them to develop their skills and knowledge of best practices in sitereliability engineering. You establish and enforce engineering processes … organization. You collaborate with senior leadership to shape the vision and direction of the company (cloud) infrastructures, and you help drive the development of SRE-specific strategies and initiatives that align with business objectives. You build and maintain strong relationships with stakeholders across the organization, and you represent the SREMore ❯
databases and we want to grow that number, along with delivering more features without compromising from reliability and scalability. This is where our SRE team comes into the picture. The SRE team is responsible for managing Neon's multi-region, multi-cloud deployment in close collaboration with the broader … engineering team. All the features we want to implement can only reach our customers if the changes are delivered reliably, which means the SRE team plays a significant role in defining our pace of development. Successful candidates will get the opportunity to contribute to the effort of evolving Neon to … cloud and infrastructure topics Be ready to join an on-call rotation We're looking for someone who has 4+ years experience working in SiteReliability Engineering Experience with cloud infrastructure components in Azure and/or AWS Experience in a complex Linux infrastructure environment Experience focusing on More ❯
Out in Science, Technology, Engineering, and Mathematics
Your Impact As a contributor in the APX SRE organization, you are passionate about delivering solutions to the real-time problems our mission-critical cloud native services encounter. You are also obsessed about achieving the high quality and reliability our customers demand. You will work closely not only with … the APX SRE organization, but your technical deliverables will reach the entire engineering organization to enable product teams to continuously deliver features on the vanguard of innovation. What You'll Do Location: London, England. Build robust, easy-to-use foundational platforms and tools that enable engineering teams to provision services … rapidly, consistently, and securely. Exemplify cloud-native sitereliability best practices. Write code that is performant, maintainable, clear, and concise. Employ strong problem-solving skills, with the ability to debug problems in cloud-native distributed systems. Influence and educate the engineering organization to adopt new and improved architectural More ❯
services to provide world-class user experiences on Azure. Work together with the team to ensure service quality, availability, and reliability. Participate in live-site , security reviews, and analysis to partner teams to ensure our services are secure and reliable. Engineers should expect to participate in a regularly scheduled … languages (like Java or C#) and related frameworks. Experience with cloud infrastructures like Azure or AWS and open-source tools and frameworks are preferred. SiteReliability Engineering IC3 - The typical base pay range for this role across the U.S. is USD $98,300 - $193,200 per year. There More ❯
Type: Full-time Location Type: On-site Location: London, England, United Kingdom Salary: Not disclosed Description As a critical and trusted member of the Systems Engineering team, you'll be working side-by-side with software engineers to design and deliver mission critical services and systems. You'll be More ❯
The SRE Manager is responsible for leading the SiteReliability Engineering function across Europe, ensuring the reliability, scalability, and performance of critical infrastructure and services. This role plays a key part in the global follow-the-sun support model, working closely with the Global SRE Leader to … impact team. You'll collaborate with Engineering, Infrastructure, and Operations teams to maintain high availability and resilient service delivery, while also mentoring a regional SRE team focused on continuous improvement and innovation. Key Responsibilities: Technical Leadership Develop deep expertise in the Titanium trading platform to lead and support critical business … ensuring priorities align with business goals and resource capacity. Operational Excellence Champion initiatives that enhance system availability, scalability, and performance. Collaborate with the Global SRE Leader to refine and enforce operational policies (e.g., Capacity Planning, Change Management, Disaster Recovery). Cross-Functional Collaboration Partner with Software Engineering, Infrastructure, Operations, Security More ❯
The SRE Manager is responsible for leading the SiteReliability Engineering function across Europe, ensuring the reliability, scalability, and performance of critical infrastructure and services. This role plays a key part in the global follow-the-sun support model, working closely with the Global SRE Leader to … impact team. You'll collaborate with Engineering, Infrastructure, and Operations teams to maintain high availability and resilient service delivery, while also mentoring a regional SRE team focused on continuous improvement and innovation. Key Responsibilities: Technical Leadership Develop deep expertise in the Titanium trading platform to lead and support critical business … ensuring priorities align with business goals and resource capacity. Operational Excellence Champion initiatives that enhance system availability, scalability, and performance. Collaborate with the Global SRE Leader to refine and enforce operational policies (e.g., Capacity Planning, Change Management, Disaster Recovery). Cross-Functional Collaboration Partner with Software Engineering, Infrastructure, Operations, Security More ❯
years of experience in designing, analyzing, and troubleshooting distributed systems, and 2 years of experience leading projects and providing technical leadership. Experience in SRE or incident management/response environments. Preferred qualifications: Experience working in computing, distributed systems, storage, or networking. Experience in telemetry systems, incident and risk management. Expertise … code, and to automate routine tasks. Excellent problem-solving approach, with verbal and written communication skills. About the job SiteReliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that Google Cloud's services-both our … our systems capacity and performance. Much of our software development focuses on optimizing existing systems, building infrastructure and eliminating work through automation. On the SRE team, you'll have the opportunity to manage the complex challenges of scale which are unique to Google Cloud, while using your expertise in coding More ❯
grow our small team into a global footprint that can provide expert engagement across our core serving systems. As an early member of the SRE team, you will report directly to the Director of Managed Infrastructure and play a foundational role in expanding our SRE practice, integrating reliability principles … through the development of automated systems for software delivery, system failover, and capacity management. About You: At least 3 years of experience in an SRE role, or at least 5 years of experience in an adjacent role (e.g., platform engineering), operating in a scaled environment. Firm grasp of the SRE … philosophy and mindset, with practical experience working on or directly with SRE teams that have proactively engaged in system design and improvement. Strong sense of accountability and commitment to problem-solving, backed by a curiosity to dig deep and identify root causes. Willingness to proactively engage with development teams to More ❯
Southampton, Hampshire, United Kingdom Hybrid / WFH Options
NICE
production environment by monitoring availability and taking a holistic view of system health Build software and systems to manage platform infrastructure and applications Improve reliability, quality, and time-to-market of our suite of software solutions Measure and optimize system performance, with an eye toward pushing our capabilities forward … Participate in system design consulting, platform management, and capacity planning Create sustainable systems and services through automation and uplifts Balance feature development speed and reliability with well-defined service level objectives Have you got what it takes? 3-6 years of working experience in a similar role, with a … Python, Go, Java, C#) and experience with scripting languages (e.g., Bash, PowerShell). Deep understanding of cloud computing platforms (e.g., AWS), the working and reliability constraints of some of the prominent services (e.g., EC2, ECS, Lambda, DynamoDB etc) Experience with infrastructure as code tools such as CloudFormation, Terraform. Deep More ❯