Own the availability and performance of mission-critical services and build automation to prevent problems recurrence. Improve the lead time of Engineers at Preply. Improve the system’s scalability, observability, and alerting. Practice sustainable incident response and blameless postmortems. Collaborate with product teams to help them tackle technical issues and design new systems. Strengthen credibility with the quality of the More ❯
London, England, United Kingdom Hybrid / WFH Options
Antler
environment. Deep expertise in backend development (Python preferred), cloud infrastructure (GCP/AWS), and system design. Strong understanding of modern software development best practices, including CI/CD, containerization, observability, and microservices. Experience working closely with Product teams to align technical decisions with business priorities. Excellent communication and stakeholder management skills, with the ability to translate technical complexity into business More ❯
of these technologies is desirable, but not essential for a successful application: web front end development (TypeScript, React). containerisation/virtualisation technologies such as Docker, Podman and KVM. observability infrastructure, e.g. Victoria Metrics, Prometheus, Grafana. configuration management using Terraform, Bash, Python or similar tools. hands-on networks and associated technology (e.g. STP, BGP, OSPF, MPLS). working knowledge of More ❯
for engineering strategy, hiring, delivery, and quality Demonstrated ability to lead end-to-end development of complex, cloud-native SaaS platforms at scale — including architecture, infrastructure, DevOps, QA, and observability Proven track record in building and scaling SaaS applications in regulated or healthcare-adjacent domains Deep expertise in modern tech stacks (e.g., Python, Typescript, Node.js, React), cloud platforms (AWS, Azure More ❯
London, England, United Kingdom Hybrid / WFH Options
Moneysupermarket Group
paved roads to help teams get their apps up and running quickly in a consistent manner Event-Driven: We share data through an event-driven system powered by MSK Observability: Datadog is used for comprehensive logging and monitoring Databases: We use a combination of MongoDB and AWS Relational Databases Automation and CICD: Deployments are highly automated using Jenkins pipelines and More ❯
practices (Agile, Scrum, Kanban) Proficiency in CI/CD pipelines, infrastructure as code, and cloud data tooling Familiarity with data governance, privacy, and security principles Experience using metrics and observability tools to monitor data platform health and team performance Experience in performance management and setting measurable goals for team members This role isn't for you if. You rely on More ❯
practices (Agile, Scrum, Kanban) Proficiency in CI/CD pipelines, infrastructure as code, and cloud data tooling Familiarity with data governance, privacy, and security principles Experience using metrics and observability tools to monitor data platform health and team performance Experience in performance management and setting measurable goals for team members This role isn't for you if. You rely on More ❯
practices (Agile, Scrum, Kanban) Proficiency in CI/CD pipelines, infrastructure as code, and cloud data tooling Familiarity with data governance, privacy, and security principles Experience using metrics and observability tools to monitor data platform health and team performance Experience in performance management and setting measurable goals for team members This role isn't for you if. You rely on More ❯
Technical Leadership & DevOps Culture Lead by example across delivery teams, offering hands-on technical support and ensuring engineering excellence. Promote a DevOps-first culture by championing continuous delivery, automation, observability, and operational readiness in everything we build. Help teams strike the right balance between shipping value quickly and building with long-term sustainability in mind. Collaboration & Influence Work hand-in More ❯
and driving down costs. Application development. If you're currently a application engineer working in Python or NodeJS with a strong operational slant, that can work well for us. Observability (Datadog), with a strong focus on enabling and empowering Engineering teams to understand their product in Production. SAAS Networking. Geolocation based performance, the path to multi-region, frontend performance optimisation. More ❯
and at least one major Cloud Platform Experience with CI/CD, Release management and DevOps related technologies Experience with Grafana/Prometheus/Splunk or similar monitoring/observability technologies Comfortable with Shell Scripting and Python programming (or other programming languages) Experience building secure/scalable platforms/products on cloud Practical experience with modern SRE lifecycle, tools, and More ❯
domain. Experience in a strongly/statically typed language. Have a strong understanding of designing, building, and running high-quality, standards-compliant workflow APIs, with a focus on testing, observability, and performance. Have worked with a cloud provider (AWS/Azure/GCP). Have worked with distributed systems and are comfortable debugging through tracing and observability. Willing to be More ❯
domain. Experience in a strongly/statically typed language. Have a strong understanding of designing, building, and running high-quality, standards-compliant workflow APIs, with a focus on testing, observability, and performance. Have worked with a cloud provider (AWS/Azure/GCP). Have worked with distributed systems and are comfortable debugging through tracing and observability. Willing to be More ❯
domain. Experience in a strongly/statically typed language. Have a strong understanding of designing, building, and running high-quality, standards-compliant workflow APIs, with a focus on testing, observability, and performance. Have worked with a cloud provider (AWS/Azure/GCP). Have worked with distributed systems and are comfortable debugging through tracing and observability. Willing to be More ❯
/gitlab/SVN source control Experience with testing methodology and frameworks Experience with continuous integration systems and pipelines Experience of video and audio encoding and streaming Experience of observability tooling such as grafana, prometheus, opensearch and creating observable systems Knowledge of Rust, Java, c# Knowledge of PlayStation hardware and SDK Benefits of working in Gaming, Developer and Future Technology More ❯
London, England, United Kingdom Hybrid / WFH Options
Equinix
networking technologies and ecosystems, such as Routing Daemons (FRR, Bird, GoBGP), Linux Networking (eBPF, VPP, XDP), and SONiC, or other Linux-based open Network Operating Systems Involvement with modern observability platforms (Prometheus/PromQL, Grafana, gNMI, etc) Experience with network flow export (Netflow, IPFIX, sFlow) and analysis Solid understanding of full networking stack (routing, switching and optical networking), including key More ❯
Royal Leamington Spa, England, United Kingdom Hybrid / WFH Options
Tata Consultancy Services
If you need support in completing the application or if you require a different format of this document, please get in touch with at UKI.recruitment@tcs.com or call TCS London Office number 02031552100/+44 204 520 2575 with the More ❯
London, England, United Kingdom Hybrid / WFH Options
Ticketmaster
VP of Engineering (Remote, United Kingdom) Join to apply for the VP of Engineering (Remote, United Kingdom) role at Ticketmaster Continue with Google Continue with Google VP of Engineering (Remote, United Kingdom) Join to apply for the VP of Engineering More ❯
London, England, United Kingdom Hybrid / WFH Options
DeepL
Head of Site Reliability Engineering & Platform Join to apply for the Head of Site Reliability Engineering & Platform role at DeepL Head of Site Reliability Engineering & Platform Join to apply for the Head of Site Reliability Engineering & Platform role at DeepL More ❯
Join to apply for the Senior Solution Architect role at EDB 4 days ago Be among the first 25 applicants Join to apply for the Senior Solution Architect role at EDB Get AI-powered advice on this job and more More ❯
London, England, United Kingdom Hybrid / WFH Options
Vodafone
Job description Staff and Team Lead, Onyx Application Engineering The Onyx Research Data Tech organization represents a major investment by GSK R&D and Digital & Tech, designed to deliver a step-change in our ability to leverage data, knowledge, and More ❯
London, England, United Kingdom Hybrid / WFH Options
JR United Kingdom
Observability Specialist – Grafana/Golang, London Client: Location: London, United Kingdom Job Category: Other EU work permit required: Yes Job Views: 4 Posted: 26.06.2025 Expiry Date: 10.08.2025 Job Description: A financial markets firm is implementing observability across their infrastructure estate and has an opportunity for a Grafana specialist to play a key role in building high-quality dashboards and visualisations. … infrastructure as code pipelines into key metrics and actionable insights. This will involve working with engineers on requirements, producing iterative designs, and developing supporting tools. Requirements: 5+ years of observability experience in automated infrastructure environments Strong query experience in PromQL, VictoriaMetrics, or VictoriaLogs Tool development experience in Golang or Python Understanding of infrastructure as code outputs and tools (Terraform) Linux More ❯
Senior Site Reliability Engineer - Monitoring and Observability Join to apply for the Senior Site Reliability Engineer - Monitoring and Observability role at Macquarie Group Senior Site Reliability Engineer - Monitoring and Observability Join to apply for the Senior Site Reliability Engineer - Monitoring and Observability role at Macquarie Group Get AI-powered advice on this job and more exclusive features. Our team is … dedicated to running and uplifting the current environment to the NextGen IT Monitoring and Observability stage. We run and maintain enterprise-wide log analytics, monitoring, and observability services, ensuring optimal performance and customer satisfaction. At Macquarie, our advantage is bringing together diverse people and empowering them to shape all kinds of possibilities. We are a global financial services group operating … ll be part of a friendly and supportive team where everyone - no matter what role - contributes ideas and drives outcomes. What role will you play? As a Monitoring and Observability Engineer, you will run and maintain enterprise-wide log analytics, monitoring, and observability services. You will be responsible for improving the value provided by the log analytics platform to drive More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Cpl
Site Reliability Engineer (SRE) Lead – Observability Rate: £450-£475 per day (Inside IR35) Location: London (Hybrid, 2 days on site per week) Contract Role Overview: Join a high-impact team where you'll lead and shape the SRE and Observability function for a major transformation programme. This role goes beyond traditional SRE – you’ll champion best practices across product teams … drive observability strategy, and work hands-on with cutting-edge tools like Datadog and AWS. Key Responsibilities: Lead the SRE function and promote observability-first thinking across development and operations teams. Define and implement the observability roadmap across product domains in collaboration with the client. Be hands-on with Datadog for infrastructure and application-level monitoring. Guide and review daily … operations and improvements across observability platforms. Partner with engineering squads to deliver on observability requirements in an agile, demand-led way. Core Skills & Experience: Proven experience as a hands-on SRE Engineer. Deep understanding of observability and monitoring practices. Practical experience with Datadog (or similar observability platforms). Strong DevOps toolchain knowledge: GitHub, GitHub Actions, Jenkins, CodeQL, Nexus, CloudFormation, Terraform. More ❯
Site Reliability Engineer (SRE) Lead – Observability Rate: £450-£475 per day (Inside IR35) Location: London (Hybrid, 2 days on site per week) Contract Role Overview: Join a high-impact team where you'll lead and shape the SRE and Observability function for a major transformation programme. This role goes beyond traditional SRE – you’ll champion best practices across product teams … drive observability strategy, and work hands-on with cutting-edge tools like Datadog and AWS. Key Responsibilities: Lead the SRE function and promote observability-first thinking across development and operations teams. Define and implement the observability roadmap across product domains in collaboration with the client. Be hands-on with Datadog for infrastructure and application-level monitoring. Guide and review daily … operations and improvements across observability platforms. Partner with engineering squads to deliver on observability requirements in an agile, demand-led way. Core Skills & Experience: Proven experience as a hands-on SRE Engineer. Deep understanding of observability and monitoring practices. Practical experience with Datadog (or similar observability platforms). Strong DevOps toolchain knowledge: GitHub, GitHub Actions, Jenkins, CodeQL, Nexus, CloudFormation, Terraform. More ❯