London, Bloomsbury, United Kingdom Hybrid / WFH Options
IntaPeople
or AWS CodePipeline Support and train technical staff in upskilling necessary for ongoing operations Monitor and ensure system reliability, availability, and performance using tools likeCloudWatch, Prometheus, Icinga2, Grafana, and Datadog Automate deployment, scaling, and management of containerized applications using Docker and Kubernetes Desirable skills Travis CI Monitoring – Grafana, Icinga Prometheus Rabbit MQ/AMQP Working knowledge of security best practices More ❯
as GitLab , GitHub Actions, or CircleCI Strong testing capabilities using JUnit , RestAssured , or similar frameworks Proactive with monitoring, observability, and system health Desirable Skills: Exposure to monitoring platforms like Datadog, Grafana, Prometheus , or PagerDuty Familiarity with Python scripting Experience with Kubernetes and deployment tools such as Helm Why Join H&B Tech? Help define the future of digital health & wellness More ❯
codebase, currently in Java (11+), and ideally Spring Boot. You will be working with SQL and large SQL databases, Docker, Kubernetes, OpenAPI specifications, and distributed system observability tooling (e.g., Datadog APM). Infrastructure automation is primarily owned by the infrastructure team, but you will be a consumer of their work; familiarity with AWS, Terraform and Docker is beneficial. Testing approaches More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
Noir
and maintaining CI/CD pipelines, and be confident scripting in Python, C# or similar scripting languages. You'll also be comfortable working with monitoring and performance tools like Datadog or Prometheus, and ideally, you'll have worked in a fast-moving SaaS or product-led business before. Bonus points if you've helped shape DevOps roadmaps, mentored others, or More ❯
are JVM based with the majority running on Java 21. We're in the process of moving our backend services to Spring Boot. We've invested heavily in our DataDog integration to bring world class observability and monitoring to our systems. We've recently moved to Gitlab and are currently building out our next generation of automated deployment pipelines. We More ❯
frontend architecture (e.g., Module Federation or Single-SPA). Experience with cloud-native DevOps tooling: Docker, Kubernetes, AWS/GCP deployments. Proficiency in analytics and observability tools like Sentry, Datadog, or LogRocket. Soft Skills Strategic thinker with strong problem-solving and decision-making skills. Ability to work in fast-paced, agile environments with cross-functional teams. Clear communication and documentation More ❯
Experience of using Git or similar to track changes Experience of both the full .NET Framework and .NET Core Experience of using observability systems such as Elastic APM or DataDog to track and diagnose issues in production A solid understanding of security principles and secure coding including OWASP Top 10 Nice to haves: o Experience in VOIP, (SIP and RTP More ❯
tuning. Lead technical triage and root cause analysis for infrastructure-related issues Develop and deploy applications using Docker and AWS FARGATE Use CloudWatch, CloudTrail, and third-party tools like Datadog for performance and cost efficiency Configure AWS networking (VPCs, TGWs), enforce governance via AWS Config and tagging policies Maintain architecture diagrams, SOPs, and collaborate across engineering and product teams Should More ❯
tuning. Lead technical triage and root cause analysis for infrastructure-related issues Develop and deploy applications using Docker and AWS FARGATE Use CloudWatch, CloudTrail, and third-party tools like Datadog for performance and cost efficiency Configure AWS networking (VPCs, TGWs), enforce governance via AWS Config and tagging policies Maintain architecture diagrams, SOPs, and collaborate across engineering and product teams Should More ❯
Cloud DevOps, SaaS, or observability, with 5+ years in leadership roles. Strong hands-on experience with AWS, GCP, Azure, K8S, Terraform and observability tools: Prometheus, Grafana, OpenTelemetry, ELK, Splunk, Datadog, and similar. Proficiency with metrics, logs, traces and APM. Leadership & Global Operations Proven success leading multi-regional or global technical teams with direct management of managers. Demonstrated ability to build More ❯
building and running cloud platforms and leading teams that sit at the intersection of infrastructure and product. Great Expertise in AWS best practices, infrastructure-as-code (Terraform), and monitoring (Datadog) Strong Experience in AWS utilizing Lambda, ECS, SQS, API Gateway etc. Any Programming Language experience such as Python, Golang, Typescript, Nodejs etc. If this sounds like an interesting opportunity to More ❯
building and running cloud platforms and leading teams that sit at the intersection of infrastructure and product. Great Expertise in AWS best practices, infrastructure-as-code (Terraform), and monitoring (Datadog) Strong Experience in AWS utilizing Lambda, ECS, SQS, API Gateway etc. Any Programming Language experience such as Python, Golang, Typescript, Nodejs etc. If this sounds like an interesting opportunity to More ❯
building and running cloud platforms and leading teams that sit at the intersection of infrastructure and product. Great Expertise in AWS best practices, infrastructure-as-code (Terraform), and monitoring (Datadog) Strong Experience in AWS utilizing Lambda, ECS, SQS, API Gateway etc. Any Programming Language experience such as Python, Golang, Typescript, Nodejs etc. If this sounds like an interesting opportunity to More ❯
london (city of london), south east england, united kingdom
Harvey Nash
building and running cloud platforms and leading teams that sit at the intersection of infrastructure and product. Great Expertise in AWS best practices, infrastructure-as-code (Terraform), and monitoring (Datadog) Strong Experience in AWS utilizing Lambda, ECS, SQS, API Gateway etc. Any Programming Language experience such as Python, Golang, Typescript, Nodejs etc. If this sounds like an interesting opportunity to More ❯
A track record in mentoring other engineers, leading cross-team projects without authority, and driving design and technology decisions. Technologies we use (nice to have experience) Monitoring and alerting: Datadog, Falcon LogScale (formerly Humio) • Database management systems: PostgreSQL, ClickHouse Deployment tools: Flux, Helm, Kustomize Frontend frameworks: React, Angular Infrastructure as code: Terraform, Terragrunt Cloud provider: AWS Event streaming platform: Kafka More ❯
Masters or PhD in Computer Science, Physics, Engineering or Math. Knowledge of IP networking, VPNs, DNS, load balancing and firewalls Experience with monitoring and log aggregating frameworks like CloudWatch, Datadog, Splunk, Opentracing, AWS X-Ray, and APM tools. Experience with revision control source code repositories Experience with development and automated testing. Understanding of microservices and distributed application architecture. Strong verbal More ❯
ClaimCenter and other systems, including PAS, document management systems, and external data providers. Platform Monitoring : Determine requirements for specific alerts, set up alerts for various events and thresholds, utilise Datadog logs and dashboards for error analysis, and track DXC downtime while communicating updates to users. Platform Updates : Conduct a 3-way merge of updated code, validate new versions, and implement More ❯
North West London, London, United Kingdom Hybrid / WFH Options
ByteHire
of infrastructure setup and management Exposure to designing or building distributed systems, preferably in a cloud environment Company Tech Stack PHP, Laravel, ReactJS, TypeScript, Inertia, WordPress MySQL, Redis, ElasticSearch, DataDog, AWS, Terraform, Docker Benefits Hybrid working 1-2 days per week in the London office. Collaborate directly with the founding team and take ownership of product features. Be part of More ❯
as-code: Terraform, Pulumi Data Management and Orchestration: Airflow, dbt Databases and Data Warehouses: SQL Server, PostgreSQL, MongoDB, Qdrant, Pinecone GenAI: OpenAI APIs, HuggingFace, LangChain, Talk-to-data Monitoring: Datadog About You We are looking for someone who can wear two hats - the data architect and the strategic business consultant - so you'll need to show both advanced technical acumen More ❯
reliable, secure, and easy to use. You've led or contributed to modern cloud-native environments and are fluent in AWS best practices, infrastructure-as-code (Terraform), and monitoring (Datadog). You thrive in an environment where you're empowered to define direction , drive delivery, and represent your team's work to the wider business. You want to be part More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
Eligo Recruitment
and capacity planning for mission-critical systems Develop secure backup, recovery, and disaster recovery procedures Explore multi-tenant and sharded architectures to support growth Implement monitoring strategies using Grafana, Datadog, and CI/CD integrations Champion database best practices, mentor teams, and standardize tooling and automation What You’ll Bring Extensive experience managing cloud-hosted PostgreSQL at scale Proficiency in More ❯
of building and operating systems at scale Advanced knowledge of configuration management systems, such as: Puppet, Chef, Ansible, or related systems Significant experience of monitoring frameworks (such as CloudWatch, Datadog, Grafana, Elastic or similar) Amazon is an equal opportunities employer. We believe passionately that employing a diverse workforce is central to our success. We make recruiting decisions based on your More ❯
as Code (IaC) using Terraform and Ansible. Design highly reliable, scalable, and secure infrastructure supporting performance-critical workloads. Build proactive monitoring, observability, and alerting with Prometheus, Grafana, Azure Monitor, DataDog, and Dynatrace. Troubleshoot complex system issues spanning applications, networks, and infrastructure. Define platform SLAs, SLOs, and governance standards for self-service use. Collaborate closely with Salesforce DevOps teams to ensure … scripting in PowerShell, Python, or Bash Experience implementing GitOps workflows and managing platform SLAs, SLOs, and governance standards Familiarity with observability and monitoring tools including Prometheus, Grafana, Azure Monitor, DataDog, or Dynatrace Preferred experience supporting Salesforce DevOps pipelines and working with Java, .NET, or Node.js application environments Exposure to AI/ML platforms, real-time data pipelines, and basic networking More ❯