the equivalent with Azure and GCP Background knowledge and hands-on practice in Observability, specifically experience working with one or more of the following tools - Kibana, Open-Search, Grafana, Datadog, Sumo Logic, New Relic, AppDynamics, Dynatrace, Prometheus, Logz.io, SignalFX, Instana, Splunk, Honeycomb, Jaeger Hands-on experience with Infrastructure as a Code (Terraform/Ansible) Hands-on experience in technical integrations More ❯
City of London, England, United Kingdom Hybrid / WFH Options
Parser Limited
and other relevant tools. Security Best Practices: IAM, MFA, data encryption, firewall configurations. Programming/Scripting: Python, Terraform, or similar languages. Event-Driven Architectures: Kafka. Monitoring and Logging: Datadog, ELK Stack, Prometheus, etc. Experience in agile methodologies and DevOps practices. Location: Hybrid. Office located in London. (Hayes area). Office presence required: Yes. Frequency: 2-3 times a week at More ❯
automation. Effective communication skills. Ability to work independently and manage multiple priorities. Experience mentoring junior DevOps team members. Preferred Skills & Experience: Experience with modern tooling such as Ansible, Terraform, DataDog, Jenkins, GitLab, ServiceNow. Source control with GIT, Bitbucket, Nexus, Artifactory. Strong problem-solving skills and root cause analysis. Networking diagnostics experience. AWS certifications in Developer, SysOps, or DevOps. Ability to More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Roc Search
and serverless compute options. Build and maintain CI/CD pipelines using industry-standard tools (e.g., GitHub Actions, GitLab CI, Jenkins). Implement monitoring and logging using tools like DataDog, Serilog, CloudWatch, or equivalent. Use Docker and Kubernetes for containerisation and orchestration of applications. Manage deployments with Helm and configuration in YAML. Develop shell scripts and automation for deployment and More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Annapurna
experience with CI/CD pipelines and container technologies like Docker and Kubernetes. Deep understanding of networking, distributed systems, and databases. Expertise in monitoring and observability tools such as DataDog, Prometheus, Grafana, ELK stack, or Splunk. Excellent communication skills and a meticulous approach to problem-solving. Desirable Experience: Familiarity with Azure. Experience working in the autonomous vehicle sector. Exposure to More ❯
City of London, Greater London, UK Hybrid / WFH Options
Cpl
major transformation programme. This role goes beyond traditional SRE – you’ll champion best practices across product teams, drive observability strategy, and work hands-on with cutting-edge tools like Datadog and AWS. Key Responsibilities: Lead the SRE function and promote observability-first thinking across development and operations teams. Define and implement the observability roadmap across product domains in collaboration with … the client. Be hands-on with Datadog for infrastructure and application-level monitoring. Guide and review daily operations and improvements across observability platforms. Partner with engineering squads to deliver on observability requirements in an agile, demand-led way. Core Skills & Experience: Proven experience as a hands-on SRE Engineer. Deep understanding of observability and monitoring practices. Practical experience with DatadogMore ❯
City of London, England, United Kingdom Hybrid / WFH Options
JR United Kingdom
major transformation programme. This role goes beyond traditional SRE – you’ll champion best practices across product teams, drive observability strategy, and work hands-on with cutting-edge tools like Datadog and AWS. Key Responsibilities: Lead the SRE function and promote observability-first thinking across development and operations teams. Define and implement the observability roadmap across product domains in collaboration with … the client. Be hands-on with Datadog for infrastructure and application-level monitoring. Guide and review daily operations and improvements across observability platforms. Partner with engineering squads to deliver on observability requirements in an agile, demand-led way. Core Skills & Experience: Proven experience as a hands-on SRE Engineer. Deep understanding of observability and monitoring practices. Practical experience with DatadogMore ❯
working in a hybrid team environment (3 days a week onsite in London) Experience with Terraform, Kubernetes, or CI/CD pipelines Familiarity with observability tooling (e.g. Prometheus, Grafana, Datadog) Experience mentoring or leading other engineers More ❯
City of London, London, United Kingdom Hybrid / WFH Options
RP International
tuning. Lead technical triage and root cause analysis for infrastructure-related issues Develop and deploy applications using Docker and AWS FARGATE Use CloudWatch, CloudTrail, and third-party tools like Datadog for performance and cost efficiency Configure AWS networking (VPCs, TGWs), enforce governance via AWS Config and tagging policies Maintain architecture diagrams, SOPs, and collaborate across engineering and product teams Should More ❯
or more programming languages (Go, Rust, C++, Java) Proven experience in troubleshooting and resolving complex issues in large scale backend system Experience with observability stack (ex. Elasticsearch, Logstash, Kibana, Datadog, Prometheus, Grafana) and Infrastructure-as-code (ex. Terraform) Experience with building platform solutions/services on top of major cloud providers (GCP, AWS) is a plus Experience with building and More ❯
City of London, Greater London, UK Hybrid / WFH Options
PinkWorm Recruitment
and TypeScript Working closely with ML, product, and design teams Mentoring others and helping shape engineering direction Writing clean, testable, production-ready code Troubleshooting with modern tools (e.g. Sentry, Datadog) What They’re Looking For: 6+ years in full-stack engineering roles Strong Python backend experience (Flask, FastAPI, Django, etc.) Solid React/Next.js front-end skills (TypeScript is a More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Cogna
identity management (Entra ID), and network configurations. Support container orchestration, and workload deployment using Kubernetes and AKS. Improve observability by implementing logging, monitoring, and alerting systems (e.g. Azure Monitor, Datadog, etc.). Partner with internal teams to improve resilience, automate toil, and reduce lead time to deployment. Drive root cause analysis and reliability improvements from incidents. What we’re looking More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Oliver Bernard
visibility, mission-critical systems that shape the future of digital platforms. Tech Stack Includes: C#, .NET Core, REST APIs, Azure (Functions, CosmosDB, SQL, Redis), Entity Framework, Blazor, Terraform, GitHub, Datadog You’ll Be: Building and maintaining scalable backend systems Working cross-functionally with product, QA, design & ops Taking ownership from ideation to production deployment Mentoring others and driving engineering excellence More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Understanding Recruitment
systems/infrastructure engineering role Strong scripting skills in Python , Bash , or Ruby Familiarity with configuration management tools (Ansible, Puppet, or Chef) Interest or exposure to observability tools like Datadog , Prometheus , or Grafana A passion for learning and improving in high-performance environments This is a rare chance to learn from elite engineers and contribute directly to a platform supporting More ❯
Key Responsibilities: Design and develop software tools for intelligence collection and analysis. Create systems to analyze cryptocurrency behavior on both the clear and dark web. Implement observability tools (e.g., DataDog) to monitor and resolve issues. Mentor team members and promote engineering best practices. Contribute to the team’s technical strategy and decision-making. Ideal Candidate: Passionate about cryptocurrencies and blockchain More ❯
Software Developer Platform Storage London | Hybrid | Permanent Were supporting a global investment firm in hiring an experienced Software Developer to join their Platform Storage team - a core part of their infrastructure group responsible for scaling and optimising storage systems that More ❯
Software Developer – Platform Storage 📍 London | Hybrid | Permanent 💰 Exceptional Compensation + Bonus + Benefits We’re supporting a global investment firm in hiring an experienced Software Developer to join their Platform Storage team (Golang focused) - a core part of their infrastructure More ❯
.NET or Python environments Experience operating and supporting bespoke trading platforms Commercial experience deploying applications into AKS clusters Experience operating one or more of Kafka, Redis, Atlassian Suite, Elastic, Datadog etc. Sponsorship cannot be offered for this role. Apply below with an up to date CV below to set up an initial call. More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Trust In SODA
building, and scaling data-intensive systems. Ideally they would also have good knowledge of: Containerisation (Kubernetes) Relational Databases (PostgreSQL, SQL) Data Warehousing (Snowflake, RDS) Cloud (AWS) IaC (Terraform) Monitoring (Datadog) In return they would be offering: An employee equity incentive scheme Flexible/Remote working 25 days’ holiday (+bday off + option to buy or sell an additional five days More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Areti Group | B Corp™
number of projects. The ideal candidates will have a skillset to include the following. Background PHP Solid Design Principles - TDD, BDD Symfony Restful API Microservices MySQL Docker Containers AWS DataDog Cloud These roles will be working in one of the most talented software engineering teams around and will offer the chance to move into a tech lead position very quickly. More ❯
ClaimCenter and other systems, including PAS, document management systems, and external data providers. Platform Monitoring : Determine requirements for specific alerts, set up alerts for various events and thresholds, utilise Datadog logs and dashboards for error analysis, and track DXC downtime while communicating updates to users. Platform Updates : Conduct a 3-way merge of updated code, validate new versions, and implement More ❯
of services. Implement and promote SRE principles such as SLOs/SLIs and TOIL measurement. Establish best practices for monitoring and alerting systems, with experience in observability platforms like Datadog and open telemetry preferred. Work closely with engineering/development teams to design, build, and maintain systems, assisting in product selection, schema design, and query tuning. Demonstrate extensive troubleshooting abilities More ❯