London, England, United Kingdom Hybrid / WFH Options
Realshoreit
secure, reliable and efficient. Your expertise will empower our teams to deliver high-quality software with confidence. Whether it's designing resilient cloud architectures, automating deployments or enhancing system observability, you'll bring a problem-solving mindset and a drive to make everything run seamlessly. Collaboration is at the heart of what we do. You'll work closely with engineers … tools and approaches to improve our operations. If you have extensive experience with AWS and GCP, deep expertise in Infrastructure as Code (IaC), a strong background in monitoring and observability and solid scripting skills in Bash, Python or Go, we'd love to hear from you! About The Agency Omnicom Media Group UK (OMG UK) is the media division of More ❯
City of London, London, United Kingdom Hybrid / WFH Options
UST
domains. With over 20+ years of proven expertise, the ideal candidate will shape the strategy, design, and transformation of complex infrastructure landscapes—including Wintel, Linux, Network, Voice, Collaboration, Mobility, Observability, End-User Computing, End-User Services, and Service Desk. You will lead and drive architecture review boards and provide strategic direction. This role acts as a key advisor to senior … domains: Wintel & Linux platforms Network (LAN/WAN/SD-WAN, Wireless, Firewalls) Unified Communication/Voice/Collaboration (Cisco, MS Teams) Mobility & Endpoint Management (Intune, MDM/UEM) Observability and Monitoring (ELK, Prometheus, AppDynamics, etc.) End-User Computing (VDI, physical endpoints, OS lifecycle) End-User Services and Service Desk (ITSM, automation, FCR, CSAT) Serve as a trusted advisor to More ❯
domains. With over 20+ years of proven expertise, the ideal candidate will shape the strategy, design, and transformation of complex infrastructure landscapes—including Wintel, Linux, Network, Voice, Collaboration, Mobility, Observability, End-User Computing, End-User Services, and Service Desk. You will lead and drive architecture review boards and provide strategic direction. This role acts as a key advisor to senior … domains: Wintel & Linux platforms Network (LAN/WAN/SD-WAN, Wireless, Firewalls) Unified Communication/Voice/Collaboration (Cisco, MS Teams) Mobility & Endpoint Management (Intune, MDM/UEM) Observability and Monitoring (ELK, Prometheus, AppDynamics, etc.) End-User Computing (VDI, physical endpoints, OS lifecycle) End-User Services and Service Desk (ITSM, automation, FCR, CSAT) Serve as a trusted advisor to More ❯
scale event-driven workflows using EventBridge and Lambda. Work with DynamoDB for fast, scalable key-value storage. Develop and maintain Java Spring Boot microservices deployed on EC2 instances. Ensure observability, monitoring, and fault-tolerance across the system. Collaborate with DevOps, Data Engineering, and Product teams to design scalable, cost-effective cloud solutions. Maintain security best practices in a cloud-native … performance tuning, and cost-optimization in cloud environments with Kafka for data streaming. Familiarity with CI/CD and infrastructure-as-code tools (e.g., Terraform, CloudFormation). Experience with observability tools (e.g., CloudWatch, OpenTelemetry). Experience working in a global enterprise software company. Our commitment to you! BMC's culture is built around its people. We have 6000+ brilliant minds More ❯
coming year and beyond! The role We are now looking for a Site Reliability Engineer to ensure our systems run smoothly and reliably at scale. Your expertise in monitoring, observability, and system automation will help maintain the high availability and performance our customers depend on. You will work at the intersection of development and operations, using your technical skills to … Design and implement comprehensive alerting systems that detect issues early and provide actionable insights to streamline the resolution of these issues. Collaborate with our development teams to ensure our observability stack provides clear visibility into system health and performance. Optimise on-call processes, including creating and maintaining detailed runbooks that enable efficient incident response and knowledge sharing across teams. Build More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Unitary
coming year and beyond! The role We are now looking for a Site Reliability Engineer to ensure our systems run smoothly and reliably at scale. Your expertise in monitoring, observability, and system automation will help maintain the high availability and performance our customers depend on. You will work at the intersection of development and operations, using your technical skills to … Design and implement comprehensive alerting systems that detect issues early and provide actionable insights to streamline the resolution of these issues. Collaborate with our development teams to ensure our observability stack provides clear visibility into system health and performance. Optimise on-call processes, including creating and maintaining detailed runbooks that enable efficient incident response and knowledge sharing across teams. Build More ❯
fostering an environment of continuous learning and growth, while participating in hiring processes and training engineers up to Staff standard. Operational Stability: Demonstrate a production first attitude, continuously considering observability and maintaining Service Level Objectives, while delivering change at pace. Research & Innovation: Embrace emerging technologies and trends, and share insights with the organisation, while developing and maintaining the team technology …/PostgreSQL) MongoDB Event processing with Kafka CI/CD with GitHub Actions and Azure pipelines Code quality with Sonar Microservice architecture Azure DevOps, Kubernetes, Docker Azure storage, Redis Observability Tools Dynatrace, New Relic Git, GitHub TDD, BDD Kotlin, .NET Android development Reporting built with MS SSRS and PowerBI Security and performance testing and optimisation Everyone's Welcome M&S More ❯
Collaborate with People/HR and engineering leadership on career pathing, training, and coaching for engineering staff. Technology Enablement: Evaluate and deploy tools - especially AI - that support engineering productivity, observability, and collaboration. Work closely with DevOps, QA, and SRE teams to align infrastructure and operational excellence with engineering needs. Own key vendor relationships, evaluation of partnerships and represent technology on … scaling engineering orgs across multiple geographies or domains (e.g., front-end, back-end, infrastructure). Familiarity with tools like Linear, Asana, GitHub, Datadog, DORA metrics, or similar performance/observability platforms. Background in organisational change management or engineering program management. What you can expect from us Competitive salary with substantial incentive schemes Generous long-term incentive plan (LTIP) tez token More ❯
London, England, United Kingdom Hybrid / WFH Options
Unitary
coming year and beyond! The role We are now looking for a Site Reliability Engineer to ensure our systems run smoothly and reliably at scale. Your expertise in monitoring, observability, and system automation will help maintain the high availability and performance our customers depend on. You will work at the intersection of development and operations, using your technical skills to … Design and implement comprehensive alerting systems that detect issues early and provide actionable insights to streamline the resolution of these issues Collaborate with our development teams to ensure our observability stack provides clear visibility into system health and performance Optimise on-call processes, including creating and maintaining detailed runbooks that enable efficient incident response and knowledge sharing across teams Build More ❯
A track record of shaping incident processes, on-call practices, or sharing reliability ownership across multiple teams. Deep understanding of site reliability principles and applying them to databases, including observability and limiting the impact of long-running or resource-heavy queries. Experience with infrastructure automation, like setting up monitoring and alerting for pipelines Bonus: Strong academic background in maths, physics … in tech or open-source communities, with a passion for sharing knowledge and inspiring others. An open mind and the flexibility to approach challenges from different angles. Experience with observability platforms such as DataDog. Experience with managing infrastructure management using Terraform. Familiarity Python, SQL, Go. The salary We expect to pay from £100,000 - £140,000 for this role. But More ❯
as follows: Own ITIL Problem & Change Management Take ownership of ITIL Problem Management activities, proactively identifying, addressing and fixing root causes of incidents and recurring issues within the system. Observability lead, promoting stability across the estate by collaborating with cross-functional teams to implement preventive measures. Actively take part in ITIL Change Management processes, ensuring that changes to the system … efficiently. Experience in implementing changes while following ITIL change management processes. Understanding of basic security principles and best practices for securing infrastructure. Optional but advantageous technical skills: Proficient using observability tools (NewRelic and Thousand Eyes), BI platform and data visualisation tools (such as Tableau and Power BI) and technology tools (Jira, Confluence). System Administration: Proficiency in Linux/Unix More ❯
A track record of shaping incident processes, on-call practices, or sharing reliability ownership across multiple teams. Deep understanding of site reliability principles and applying them to databases, including observability and limiting the impact of long-running or resource-heavy queries. Experience with infrastructure automation, like setting up monitoring and alerting for pipelines Bonus: Strong academic background in maths, physics … in tech or open-source communities, with a passion for sharing knowledge and inspiring others. An open mind and the flexibility to approach challenges from different angles. Experience with observability platforms such as DataDog. Experience with managing infrastructure management using Terraform. Familiarity Python, SQL, Go. The salary We expect to pay from £100,000 - £140,000 for this role. But More ❯
Slough, England, United Kingdom Hybrid / WFH Options
JR United Kingdom
development, QA, and operations teams to implement DevOps methodologies and toolchains. Use Infrastructure as Code (IaC) with Terraform for automation. Maintain security controls across cloud environments, ensuring compliance. Utilise observability tools to monitor and optimise production services. Design and improve CI/CD pipelines with platforms like GitLab or Jenkins. Mentor and guide DevOps and development teams, promoting continuous learning. More ❯
developers. Experience with cloud platforms (AWS, GCP, or Azure). A strong security mindset or a keen interest in cybersecurity. Bonus: experience with Kubernetes, CI/CD pipelines, and observability tools. The role will require 5 days a week onsite in London, please apply for immediate consideration. More ❯
developers. Experience with cloud platforms (AWS, GCP, or Azure). A strong security mindset or a keen interest in cybersecurity. Bonus: experience with Kubernetes, CI/CD pipelines, and observability tools. The role will require 5 days a week onsite in London, please apply for immediate consideration. More ❯
City of London, England, United Kingdom Hybrid / WFH Options
JR United Kingdom
Collaborate with teams to define and implement DevOps methodologies and toolchains. Implement Infrastructure as Code (IaC) using Terraform for automation. Maintain security controls across cloud environments, ensuring compliance. Use observability tools to monitor and optimise performance, resolving issues proactively. Design and optimise CI/CD pipelines with platforms like GitLab or Jenkins. Mentor and guide DevOps and development teams, fostering More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Arrows
architecture and development of backend services using C#, ASP.NET, .NET Core Automate infrastructure, CI/CD pipelines, and cloud operations (AWS/Azure) Promote engineering best practices, security, and observability Mentor engineers and foster a culture of continuous improvement Contribute to technology direction, including adoption of tools like Go and Python What We’re Looking For Deep expertise in C# More ❯
architecture and development of backend services using C#, ASP.NET, .NET Core Automate infrastructure, CI/CD pipelines, and cloud operations (AWS/Azure) Promote engineering best practices, security, and observability Mentor engineers and foster a culture of continuous improvement Contribute to technology direction, including adoption of tools like Go and Python What We’re Looking For Deep expertise in C# More ❯
paced environment. Responsibilities: Develop scalable tools for automation, deployment, and infrastructure management. Enhance system performance, reliability, and efficiency through automation. Manage AWS infrastructure, ensuring smooth configuration and deployment. Implement observability tools for monitoring and debugging. Ensure fault tolerance, redundancy, and high availability of trading systems. Support infrastructure for C++ and Rust-based trading systems, ensuring seamless integration. Qualifications: Strong programming More ❯
production tools to automate deployment, monitoring, and infrastructure management. Improving system performance, reliability, and efficiency through automation and tooling. Managing AWS-based infrastructure, ensuring seamless configuration and deployment. Implementing observability tools to enhance monitoring, debugging, and performance insights. Ensuring fault tolerance, redundancy, and high availability across critical trading systems. Supporting infrastructure for C++ and Rust-based trading systems, ensuring smooth More ❯
autonomy, clean code, and continuous delivery The technical landscape: Azure (AKS, Functions, App Services, Event Grid, etc.) Infrastructure as Code (Terraform) CI/CD using Azure DevOps Monitoring and Observability (Application Insights, Azure Monitor, Prometheus/Grafana) GitHub for version control, and a modern SDLC with automated testing and security baked in What we’re looking for: Someone who can More ❯
Engineers to ensure customer success. Translate technical issues into executive-ready summaries and business impact statements. Participate in post-mortems and executive briefings for strategic accounts. Drive adoption of observability, automation, and self-healing support mechanisms using AI/ML tools. Required Qualifications 8+ years in enterprise storage, distributed systems, or cloud infrastructure support/engineering. Deep understanding of file … diagnostics and reduce MTTR. Preferred Qualifications Experience with DDN, VAST, Weka, or similar scale-out file systems. Strong scripting/coding ability in Python, Bash, or Go. Familiarity with observability platforms: Prometheus, Grafana, ELK, OpenTelemetry. Knowledge of replication, consistency models, and data integrity mechanisms. Exposure to Sovereign AI, LLM model training environments, or autonomous system data architectures. This position requires More ❯
is required to assist in upgrading the Elastic DP estate to Kubernetes, moving away from obsolete technology (Cloudera), upgrading to RHEL 8, and contributing to improving the stability and observability of the platform. The role also involves providing advanced analytics tooling and services for modeling analytics. Responsibilities include: Supporting production application support in AWS, with experience in incident and change More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Inara
and accelerate platform delivery Deploy and monitor services in AWS using Kubernetes Work in a high-frequency release environment — deploying multiple times per day Use Grafana (or similar) for observability and maintain production-grade reliability Work onsite 3 days/week in London for the first 4–6 weeks (hybrid flexibility beyond this) We’re Looking For: 5+ years of More ❯
and accelerate platform delivery Deploy and monitor services in AWS using Kubernetes Work in a high-frequency release environment — deploying multiple times per day Use Grafana (or similar) for observability and maintain production-grade reliability Work onsite 3 days/week in London for the first 4–6 weeks (hybrid flexibility beyond this) We’re Looking For: 5+ years of More ❯