SiteReliabilityEngineer - Data Infrastructure, AD/ADAS London/Product & Technology - AD/ADAS/Employee/hybrid Woven by Toyota is enabling Toyota’s once-in-a-century transformation into a mobility company. Inspired by a legacy of innovating for the benefit of others, our mission is to challenge the current state of mobility through … development. The right candidate will have excellent communication skills, solid coding skills, expertise in building scalable, reliable, highly available and fault-tolerant systems, broad knowledge of software engineering and sitereliability engineering in areas such as Large-Scale Data and Compute Infrastructure, Stream Processing, Kubernetes, High-Performance Networking, Observability and Infrastructure Automation. RESPONSIBILITIES Set the technology strategy for … maintain, optimize and support large scale, multi-region, multi-cloud compute and storage infrastructure powering our data platform and mission critical services. Work with fellow Data Infrastructure engineers and SiteReliability engineers to ensure our systems are scalable, reliable, fault-tolerant, highly available, highly performant, and observable. Manage incidents, triage product or system issues and debug/track More ❯
below. Assume a critical role in defining the future of a globally recognized firm and have a direct and significant effect in a realm tailored for top achievers in site reliability. As a Lead SiteReliabilityEngineer at JPMorgan Chase within Risk Technology Team, you hold a leadership role in your team, demonstrate strong knowledge across … digestible work for other engineers, act as a technical lead for medium to large-sized products, and provide advice and mentoring to other engineers. Job responsibilities Demonstrates and champions sitereliability culture and practices and exerts technical influence throughout your team Leads initiatives to improve the reliability and stability of your team's applications and platforms using … quickly to avoid financial losses Documents and shares knowledge within your organization via internal forums and communities of practice Required qualifications, capabilities, and skills Formal training or certification on reliability, scalability, performance, security, enterprise system architecture, toil reduction concepts and proficient advanced experience Fluency in at least one programming language such as Python, Java Spring Boot, Unix Shell. Deep More ❯
SiteReliabilityEngineer, ESC Managed Operations Job ID: 2847735 | Amazon Development Centre Ireland Limited - D94 AWS is set to introduce the inaugural European Sovereign Cloud (ESC), marking a significant development in utility computing (UC). To spearhead this initiative, we are actively seeking experienced systems development engineers with a strong background in automation and operations. As part … AWS services and technology. A typical day in this role involves collaborating with technology leaders, contributing to the enhancement of day-to-day operations, and ensuring improvements in availability, reliability, latency, performance, and efficiency of the ESC. You will be required to occasionally participate in “on-call” rotations to resolve incidents occurring out-of-hours. The overarching goal is … s largest cloud providers. A typical day in this role involves collaborating with technology leaders, contributing to the enhancement of day-to-day operations, and ensuring improvements in availability, reliability, latency, performance, and efficiency of the ESC. You will be required to occasionally participate in “on-call” rotations to resolve incidents occurring out-of-hours. Eligibility requirement * Fluency in More ❯
Key Accountabilities: Working with Curve's engineering teams to support the infrastructure they need and the platforms on which their services run. Observing our platforms and services to measure reliability, find areas for improvement, and discover any risks to the stability or security of our systems. Maintaining new and existing infrastructure with code, by writing well-designed Terraform modules … them and prevent them from happening again. Sharing your work and talking about it within the Platform and Engineering team, to spread knowledge and be an ambassador for good sitereliability practices. Deploying innovative new tools to help accelerate engineers and make their lives easier, giving them more time to focus on what they are building. Documenting and … driving the adoption of engineering best practices across the wider Engineering team. Demonstrating ownership of all initiatives from concept to launch, and embodying unwavering commitment and reliability, with a genuine willingness to contribute and address challenges. Projects/initiatives that we want you to contribute to, or lead the charge on: Helping Curve scale to many millions of customers More ❯
London, England, United Kingdom Hybrid / WFH Options
Parity Technologies
Social network you want to login/join with: Senior DevOps Engineer/SRE - Full-time, London col-narrow-left Client: Parity Technologies Location: London, United Kingdom Job Category: Other - EU work permit required: Yes col-narrow-right Job Reference: 8d1f5d427e3a Job Views: 4 Posted: 22.06.2025 Expiry Date: 06.08.2025 col-wide Job Description: People in Our Collective Are Highly … at the forefront of new technical developments in the ecosystem, contributing to the Polkadot 2.0 roadmap, including JAM. Operational Excellence : Contribute to Parity’s blockchain node operations, improving the reliability of the Polkadot network by managing test and benchmark networks in the cloud and on-prem. Enhance our observability initiatives by operating mainnet nodes for the Polkadot and Kusama … of monitoring with Prometheus/Grafana is a plus Understanding of what a blockchain is and how to deploy a blockchain network On-call experience with the mindset of sitereliability, including incident response and postmortem analysis Ability to work autonomously, be proactive, prioritize, communicate, and function in a small (partly-remote) team About Working for Us Competitive More ❯
SiteReliabilityEngineer (SRE) - Crypto High-Frequency Trading SiteReliabilityEngineer (SRE) - Crypto High-Frequency Trading Get AI-powered advice on this job and more exclusive features. This range is provided by Selby Jennings. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. Base pay … range Direct message the job poster from Selby Jennings Consultant - Technology Infrastructure Recruitment Selby Jennings We are looking for a SiteReliabilityEngineer (SRE) to help design and build the automation, configuration, and deployment tooling that underpins our high-frequency trading (HFT) platform. This role is at the heart of ensuring our trading systems remain highly available … scalable, and robust, supporting the fast-paced and demanding nature of our environment. What You'll Be Doing We are looking for a SiteReliabilityEngineer (SRE) to help design and build the automation, configuration, and deployment tooling that underpins our high-frequency trading (HFT) platform. This role is at the heart of ensuring our trading systems More ❯
Python - Dev Ops/SREEngineer London hybrid working - Contract Opportunity Must have's Python scripting - They could take someone with Go Automation experience Prometheus/grafana/Prom QL CI/CD AWS Splunk Key Responsibilities Develop and maintain automation scripts, primarily in Python(Go experience also considered). Respond to and resolve incidents, manage changes, and perform More ❯
Milton Keynes, Buckinghamshire, England, United Kingdom
Noir
SiteReliabilityEngineer (SRE) - Market leading company - Milton Keynes (Tech stack: .Net, C#, ASP.Net Core, SQL Server, PowerShell, Azure CLI, Bash, Azure DevOps, Jenkins, GitHub Actions, Docker, Kubernetes) Help shape the tech future of UK market leader! Backed by a major financial institution with soaring profits - my client is modernising platforms, embracing AI, and driving automation at … scale. We're hiring a Lead SiteReliabilityEngineer (SRE) to drive reliability, observability, and performance across our Azure cloud infrastructure. You'll work in a modern engineering environment where we live by "you build it, you run it", focused on automation, scale, and resilience. Tech stack you'll work with: .NET, C#, ASP.NET Core, SQL … Server, PowerShell, Azure CLI, Bash, Azure DevOps, Jenkins, GitHub Actions, Docker, Kubernetes We want to hear from you if: As a SiteReliabilityEngineer (SRE) you've delivered scalable systems using .NET, C#, and ASP.NET Core , with real-world experience managing production workloads You've automated operations using PowerShell, Azure CLI, and Bash to reduce toil and More ❯
Senior SiteReliabilityEngineer … UK (remote or hybrid) Join a market leader in unified IT operations, helping IT teams automate, manage, and remediate within a single modern interface. We’re hiring a Senior SRE to scale our client’s platform to support millions of global users. You’ll ensure systems are secure, reliable, and scalable—working with modern cloud tech and a dedicated SRE … team. What You’ll Bring: 7+ years in SRE 3+ years in Java based products 5+ Linux/sysadmin AWS (EKS, EC2, CDK, VPC), Kubernetes, CI/CD IaC: Terraform, Helm, Ansible Experience with observability (New Relic, Splunk, DataDog) On-call rotation & agile/SCRUM experience Must be based in the UK with no need for sponsorship . Apply now More ❯
Swindon, England, United Kingdom Hybrid / WFH Options
JR United Kingdom
Social network you want to login/join with: SiteReliabilityEngineer, swindon, wiltshire col-narrow-left Client: Harrington Starr Location: swindon, wiltshire, United Kingdom Job Category: Other - EU work permit required: Yes col-narrow-right Job Views: 8 Posted: 04.06.2025 Expiry Date: 19.07.2025 col-wide Job Description: SiteReliabilityEngineer – Fintech Up to … s leading financial institutions to streamline international payments and ensure compliance at scale - all through smart automation and modern cloud-native infrastructure. They’re looking to bring on a SiteReliabilityEngineer with deep experience in observability . If you’ve worked with tools like Prometheus in AWS , supported development teams with tracing and performance insights , and … SLIs/SLOs , automating tasks, and reducing operational noise Working with AWS (EKS, EC2, Lambda, RDS), Terraform, and CI/CD tools What They’re Looking For: Experience in SRE or DevOps roles in a production environment Strong knowledge of observability tools , especially Prometheus in AWS Experience with tracing , metrics, and logs to support development teams Skills in Python or More ❯
Stockport, England, United Kingdom Hybrid / WFH Options
JR United Kingdom
team. Things are moving fast here, and as we continue to grow; reliability, automation, and scalability have never been more important to us. You will be our first SRE so a strong background in implementing SRE best practices would be Ideal. You will know what good looks like and strive to continuously improve automation, availability and resilience. This is … to build out infrastructure and tooling using AWS, Terraform, Docker, and CI/CD pipelines. Supporting and evolving our container-based architecture (we use ECS and Fargate). Driving SRE best practices: SLIs/SLOs, error budgets, reducing toil, and improving observability. Using (and hopefully enjoying!) tools like Datadog, Prometheus, Grafana, and Nix to support your work. What we’re … looking for: Strong experience with AWS, Terraform, Docker, and container orchestration (ECS/Fargate). Good understanding of CI/CD pipelines and DevOps workflows. Solid grasp of SRE principles – SLIs, SLOs, error budgets, observability, etc. Familiarity with Datadog, Prometheus, Grafana, or similar tools. Experience with Nix is a plus (or curiosity to learn it). Bonus if you’ve More ❯
at Morgan Stanley Continue with Google Continue with Google Join to apply for the Senior SiteReliabilityEngineer role at Morgan Stanley Glasgow As a senior SRE you would be joining our growing HashiVault squad as part of the strategy to offer more services and a better user experience to our clients. The current squad has been … running for 4 years and have people from several locations including New York, Montreal and Glasgow. We are adding more engineers in EMEA and require an SRE to help us create the same culture of ownership and independence which exist in our current squad. You will be working to implement new features, deal with user requests and reduce repeatable tasks … for strategic initiatives. In particular, you will be a major driver of our push to Google Cloud which is due to go live in 2026. Glasgow As a senior SRE you would be joining our growing HashiVault squad as part of the strategy to offer more services and a better user experience to our clients. The current squad has been More ❯
Job Description Job Description Elevate your engineering prowess to unprecedented levels by joining a team of exceptionally gifted professionals and position yourself among the top echelon in site reliability. As a SiteReliabilityEngineer at JPMorgan Chase within the Corporate Oversight and Governance (COG), Architecture & Engineering team, you work collaboratively with stakeholders to define non-functional … create and implement observability and reliability designs for complex systems that are robust, stable, and do not incur additional toil or technical debt. Design, create, and advocate for SRE products that can scale the implementation of SRE best practices within COGT. Evolve and debug critical components of applications and platforms. Contribute to JPMorgan Chase’s sitereliability community through internal forums, communities of practice, guilds, and conferences. Participate in architecting, designing, and building highly distributed systems and SRE products, solving complex coding problems. Maintain and promote best practices in software engineering, leading by example. Required Qualifications, Capabilities, and Skills Applied experience with SRE concepts, strategies, and culture. Knowledge of observability tools such as OTEL, Grafana, Dynatrace More ❯
for a SiteReliabilityEngineer to join their highly skilled, innovative team. Essential skills: Strong proficiency in Python for infrastructure and automation Hands-on experience in SRE, DevOps or production engineering roles Deep understanding of monitoring, incident response workflows, and system architecture Productive approach to improving systems and reducing technical debt Strong collaboration and communication skills – working … design and implement automation for operations, deployments, monitoring and incident management, as well as owning the observability stack (metrics, logs, traces and alerting). You will also: apply core SRE principles (SLIs, SLOs, error budgets) to enhance system reliability; build, document, and improve high-performance system designs; lead incident response and implement improvements; collaborate closely with quant developers/ More ❯
for a SiteReliabilityEngineer to join their highly skilled, innovative team. Essential skills: Strong proficiency in Python for infrastructure and automation Hands-on experience in SRE, DevOps or production engineering roles Deep understanding of monitoring, incident response workflows, and system architecture Productive approach to improving systems and reducing technical debt Strong collaboration and communication skills – working … design and implement automation for operations, deployments, monitoring and incident management, as well as owning the observability stack (metrics, logs, traces and alerting). You will also: apply core SRE principles (SLIs, SLOs, error budgets) to enhance system reliability; build, document, and improve high-performance system designs; lead incident response and implement improvements; collaborate closely with quant developers/ More ❯
Maidstone, England, United Kingdom Hybrid / WFH Options
JR United Kingdom
s leading financial institutions to streamline international payments and ensure compliance at scale - all through smart automation and modern cloud-native infrastructure. They’re looking to bring on a SiteReliabilityEngineer with deep experience in observability . If you’ve worked with tools like Prometheus in AWS , supported development teams with tracing and performance insights , and … great next step. What You’ll Be Doing: Managing and improving observability tools like Prometheus, Grafana, and CloudWatch Helping product teams with tracing and monitoring to improve performance and reliability Defining and improving … SLIs/SLOs , automating tasks, and reducing operational noise Working with AWS (EKS, EC2, Lambda, RDS), Terraform, and CI/CD tools What They’re Looking For: Experience in SRE or DevOps roles in a production environment Strong knowledge of observability tools , especially Prometheus in AWS Experience with tracing , metrics, and logs to support development teams Skills in Python or More ❯
Belfast, Northern Ireland, United Kingdom Hybrid / WFH Options
JR United Kingdom
s leading financial institutions to streamline international payments and ensure compliance at scale - all through smart automation and modern cloud-native infrastructure. They’re looking to bring on a SiteReliabilityEngineer with deep experience in observability . If you’ve worked with tools like Prometheus in AWS , supported development teams with tracing and performance insights , and … great next step. What You’ll Be Doing: Managing and improving observability tools like Prometheus, Grafana, and CloudWatch Helping product teams with tracing and monitoring to improve performance and reliability Defining and improving … SLIs/SLOs , automating tasks, and reducing operational noise Working with AWS (EKS, EC2, Lambda, RDS), Terraform, and CI/CD tools What They’re Looking For: Experience in SRE or DevOps roles in a production environment Strong knowledge of observability tools , especially Prometheus in AWS Experience with tracing , metrics, and logs to support development teams Skills in Python or More ❯
London, England, United Kingdom Hybrid / WFH Options
BAE
our customer’s systems are built and maintained. This role blends operational product support with software engineering to create applications to understand the overall health of our systems. The SRE team sits within a wider programme at the core of the customer mission. The role holder: As an SRE, fundamentally you will be doing work that has historically been done … engineering expertise to substitute automation for human labour, with the objective of limiting traditional manual operations work (incident tickets, on-call etc.) to no more than half of the SRE team's time (and aiming for considerably less). You will have an enthusiasm to learn and experiment, to develop tools to understand application health and improve their reliability … enable them to be scalable and resilient to failure, and how to get the best out of the infrastructure they are deployed to. Participating in the wider DevOps/SRE community within the organisation. It is desirable for you to have experience in the areas below. However more valued for this role is that you have excitement and enthusiasm to More ❯
Job Description Elevate your engineering prowess to unprecedented levels by joining a team of exceptionally gifted professionals and position yourself among the top echelon in site reliability. As a SiteReliabilityEngineer at JPMorgan Chase within the Corporate Oversight and Governance (COG), Architecture & Engineering team, you work collaboratively with stakeholders to define non-functional requirements (NFRs … observability and reliability designs for complex systems that are robust, stable, and do not incur additional toil or technical debt. Collaborate in the design, creation and advocacy of SRE products that can be used to scale the implementation of SRE best practices within COGT. Evolves and debug critical components of applications and platforms. Contributes to JPMorgan Chase’s sitereliability community via internal forums, communities of practice, guilds, and conferences. Participates in architecting, designing and building highly distributed systems and SRE products, solving complex problems in code. Maintain and promote best practices in software engineering, leading by example. Required qualifications, capabilities, and skills. Demonstrable applied experience of SRE concepts, strategies, and culture. Knowledge and experience in observability More ❯
you’ll drive continual improvement and ensure Morrisons’ applications and infrastructure are resilient, efficient, and aligned with architectural goals. This is a key role for those passionate about advancing SRE practices at enterprise scale. Responsibilities Act as SME within their Domain teams for advice & guidance in terms of CI/CD, automation and product ways of working and SRE/… Engineering standards Drive the adoption of Engineering standards and Continuous Delivery principles within multiple domains The escalation point for SRE/Engineering ways of working Influence good practices and standards within SDLC throughout the business Influence partners Infrastructure best practices Implementation of least privilege approach for services Monitoring and Alerting strategy and patterns Engineering Tooling, Patterns, Framework and Standards Proprietary … code quality management inclusive of technical debt About you Knowledge In depth understanding of SRE/Engineering, Architecture and Testing practices In depth understanding of the principals of CI/CD within SRE/Engineering In depth understanding of Cloud hosting (AWS preferred) principals and managing infrastructure as code In depth understanding of software/infrastructure application monitoring frameworks Excellent More ❯
An amazing Global Investment Client of ours located in Central London is looking for a SiteReliabilityEngineer to join their team on a permanent basis. This is a rare opportunity and the package offered for this role is up to £300k depending on skills and experience. ABOUT THE COMPANY The company is a leading provider of … innovatively creating an environment that is fast-paced, dynamic, and successful. ABOUT THE ROLE They are looking for an enthusiastic SiteReliabilityEngineer to join the SRE team in London. Their team is central to the business as they are responsible for the technology that underpins everything they do; therefore, you will have a direct impact on … be passionate about improving reliability and removing toil by identifying opportunities for automation and building platforms to make the systems more 'reliable by default'. Responsibilities: Evangelise the SRE mindset and implement best practices across the environment. Understand the business and find ways to measure and enhance resilience across the application estate. Eliminate the toil that emerges with complex More ❯
Liverpool, England, United Kingdom Hybrid / WFH Options
Bellrock Group
SiteReliabilityEngineer - Liverpool (Hybrid Working) As a SiteReliabilityEngineer at Concerto (part of Bellrock Group), you will play a pivotal role in ensuring the reliability, performance, and scalability of our Intelligent Assets Management SaaS platform. You will lead the improvement … of infrastructure, DevOps, and monitoring across our systems—empowering the engineering team to release features faster and more safely. Your hands-on experience and strategic thinking will help embed SRE principles throughout the team, improving customer experience, system health and developer productivity. You’ll work across internal environments and customer-facing systems, shaping operational excellence and reliability at every … scalable environments using technologies such as Terraform. Work closely with developers, QA, and DBAs to improve platform design and release workflows. Implement and promote best practices for operational readiness, reliability, and fault tolerance. Guide the platform team on tooling, automation, instrumentation, observability and best practice in Azure. Build a high-quality platform aligned to the Microsoft Cloud Adoption Framework More ❯
with the subject line: “Application Support Request”. Role: Senior SiteReliabilityEngineer Location: London Job Type: Permanent Are you looking to take your SRE skills to the next level? We’ve got a great opportunity for you – Senior SiteReliabilityEngineer Careers at TCS: It means more TCS is a purpose-led transformation … to prevent problems, not just react to them. Partner across teams to make performance, scalability, and user experience part of the whole engineering mindset. The Role As a Senior SiteReliabilityEngineer , you will be playing a key role in operational support, integration of applications and building and maintaining infrastructure. Your responsibilities: Effectively monitor a wide range … to incidents, and usually taking on-call responsibilities. Your Profile Essential skills/knowledge/experience: Working knowledge and prior hands-on experience using AWS services at the DevOps Engineer level. Previous experience with incidents, change and problem management. Strong background in setup and operation of enterprise observability tooling, specifically Prometheus, Grafana and Splunk, including usage of PromQL. Proficient More ❯
logistics, healthcare, and more, aiming to transform traditional business models with cutting-edge digital solutions. Know more about us: https://corporate.jd.com/JD.com is seeking a passionate sitereliabilityengineer who can ensure the stability of our eCommerce mobile apps and web apps in European countries such as the UK, France, Germany, and the Netherlands. … In this role, you will be responsible for monitoring, incident management, automating deployments, scaling, reliability testing, incident post-mortems, and more. You will work closely with engineering and commercial teams globally to ensure a seamless, reliable user experience while balancing the demands of development speed and operational stability in a dynamic, international environment. Key Responsibilities Monitor system performance data … continuously improve system stability. Minimum Qualifications: Bachelor's degree in Computer Science, Software Engineering, or a related field. 3+ years of experience in DevOps, sitereliability engineering (SRE), system stability assurance, operations and maintenance development, or related fields. Experience with common public cloud platform products (e.g., cloud hosting, cloud storage, object storage, CDN, etc.), and proficiency in containerization More ❯
advanced football analytics for performance prediction and player scouting and recruitment. What's the role? We’re looking for a DevOps Engineer with SiteReliability Engineering (SRE) skills to join the Insurance Analytics team at LCP to work on a mission-critical SaaS product used by top insurance firms across the world. Our team develops and supports … based multi-award-winning insurance analytics platform. InsurSight was launched in April 2020 and is licensed as SaaS to assess over £200bn of non-life insurance business. DevOps and sitereliability are particularly important for InsurSight because the platform uses machine learning to analyse large, complex datasets in the cloud. As a cloud-native application, it relies on … root cause analysis post-incident and implementing preventative measures. Reducing mean time to recovery by automating remediation steps. What skills and experience are we looking for? Experience in DevOps, SRE, or Cloud Engineering roles. Experience with Azure services (Azure Batch, Azure Functions, App Service Plans, Cosmos DB, Storage Accounts). Experience writing Infrastructure as Code in Terraform (or equivalent). More ❯