London, South East, England, United Kingdom Hybrid / WFH Options
Become
Azure, or GCP) and containerisation (e.g., Docker, Kubernetes) Experience with Infrastructure as Code tools (e.g., Terraform, Ansible, CloudFormation) Familiarity with monitoring, logging, and observability tools (e.g., Prometheus, Grafana, ELK, Datadog) Experience working in regulated environments such as banking, fintech, or insurance Prior experience working in or contributing to a Centre of Excellence team Strong scripting skills (e.g., Bash, Python) and More ❯
and other relevant tools. Security Best Practices: IAM, MFA, data encryption, firewall configurations. Programming/Scripting: Python, Terraform, or similar languages. Event-Driven Architectures: Kafka. Monitoring and Logging: Datadog, ELK Stack, Prometheus, etc. Experience in agile methodologies and DevOps practices. Location: Hybrid. Office located in London. (Hayes area). Office presence required: Yes. Frequency: 2-3 times a week at More ❯
Proficiency in scripting and automation using Python, Bash, or Go. Experience with Infrastructure as Code (Terraform, CloudFormation, or Ansible). Familiarity with monitoring, logging, and observability tools (Prometheus, Grafana, Datadog, ELK, etc.). Strong understanding of networking concepts (VPC, Load Balancers, DNS, Firewalls). Experience with DevOps methodologies, CI/CD pipelines, and GitOps practices. Experience with high-performance and More ❯
Gloucester, Gloucestershire, United Kingdom Hybrid / WFH Options
Navtech, Inc
Liquibase) and Git for version control. Scripting & Troubleshooting: Strong scripting skills (Python/Bash) for automation and ability to analyze logs and monitor performance using tools like AWS Cloudwatch, Datadog, Prometheus, Grafana, or pgBadger. Solid understanding of DevOps practices, including CI/CD pipelines (e.g., GitLab CI, Cloudbees, Jenkins, GitHub Actions), containerization with Docker, and monitoring/logging tools. Demonstrated More ❯
Cardiff, South Glamorgan, United Kingdom Hybrid / WFH Options
Navtech, Inc
Liquibase) and Git for version control. Scripting & Troubleshooting: Strong scripting skills (Python/Bash) for automation and ability to analyze logs and monitor performance using tools like AWS Cloudwatch, Datadog, Prometheus, Grafana, or pgBadger. Solid understanding of DevOps practices, including CI/CD pipelines (e.g., GitLab CI, Cloudbees, Jenkins, GitHub Actions), containerization with Docker, and monitoring/logging tools. Demonstrated More ❯
tools and container orchestration (Docker, ECS, or Kubernetes) Solid understanding of system/network security, IAM, VPC, and secure cloud configurations Familiarity with monitoring and logging tools (e.g., CloudWatch, Datadog, Prometheus, Sentry) Experience with Postgres, Redis, and scalable backend systems Bonus: Exposure to fintech or regulated environments, GDPR/data compliance, or SOC2 setup A little about us Our founders More ❯
Edinburgh, Midlothian, United Kingdom Hybrid / WFH Options
Aberdeen
tech talks to share knowledge and promote adoption of tools and practices. About the Candidate The ideal candidate will possess the following: Experience with observability tools (eg, Grafana, Prometheus, Datadog). Background in DevOps, SRE, or platform engineering with a security first mindset. Strong programming skills in languages such as .Net, JavaScript, Python or similar. Experience with CI/CD More ❯
tech talks to share knowledge and promote adoption of tools and practices. About the Candidate The ideal candidate will possess the following: Experience with observability tools (e.g., Grafana, Prometheus, Datadog). Background in DevOps, SRE, or platform engineeringwith a security first mindset. Strong programming skills in languages such as .Net, JavaScript, Python or similar. Experience with CI/CD tools More ❯
Washington, Washington DC, United States Hybrid / WFH Options
Epsilon Inc
Kubernetes), and cloud-native development practices Advanced knowledge of configuration management tools (Ansible, Puppet, Chef), version control systems (Git), and infrastructure automation frameworks Experience with monitoring and logging tools (DataDog, Splunk, ELK Stack), application performance monitoring solutions, and security scanning tools for vulnerability management Proficiency in programming and scripting languages including Java, Python, PowerShell, Bash, and experience with API development More ❯
needed About You 5+ years' experience in Site Reliability Engineer roles Expert+ level Linux administration, scripting, and troubleshooting Demonstrable knowledge of Observability tools (Prometheus/Grafana, New Relic, Splunk, DataDog) Comprehensive experience with AWS (Amazon Web Services) and its core capabilities (VPC, EC2, ECS, Route53, Fargate, ALB/NLB distributions, etc) Extensive experience with cloud automation and infrastructure-as-code More ❯
needed About You 5+ years' experience in Site Reliability Engineer roles Expert+ level Linux administration, scripting, and troubleshooting Demonstrable knowledge of Observability tools (Prometheus/Grafana, New Relic, Splunk, DataDog) Comprehensive experience with AWS (Amazon Web Services) and its core capabilities (VPC, EC2, ECS, Route53, Fargate, ALB/NLB distributions, etc) Extensive experience with cloud automation and infrastructure-as-code More ❯
/CD tools such as GitlabCI, CircleCI, Github Actions, and GitOps using ArgoCD, FluxCD Troubleshooting and debugging applications using Observability tooling across microservices and serverless applications such as Splunk, DataDog Managing ephemeral secrets and credentials using Hashicorp Vault Managing least privileged access to cloud resources using TPAM solutions such as Hashicorp Boundary Bonus Points for experience with: Production experience architecting More ❯
a production environment to support, operate and maintain applications. Experience working with incident management processes and tools (we use OpsGenie). Experience working with logging tools (e.g. Log Analytics, Datadog) and monitoring (e.g. Azure Monitor) and application performance management tools (we use New Relic). Good understanding of information and data security principles (e.g. GDPR, penetration testing). Experience designing More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
Harnham - Data & Analytics Recruitment
optimisation Nice to Have Experience with ML tooling (MLflow, Kubeflow) Knowledge of FastAPI , Databricks, or Snowflake Exposure to SRE practices or cloud security certifications Familiarity with Prometheus , Grafana , or Datadog Interested? If you want to be part of a world-class AI team at an early stage-where your infrastructure decisions will directly shape the company's success-apply today More ❯
skills — and a passion for building better together Nice to Have (We’ll Support Learning Too) Frontend development experience (especially with Angular) Experience with Kubernetes, Docker, GitHub Actions, or Datadog Familiarity with BDD (Gherkin, SpecFlow), observability tooling, and secure development practices Experience working in highly regulated or enterprise-scale environments What’s In It for You Be at the forefront More ❯
Manchester, Lancashire, England, United Kingdom Hybrid / WFH Options
Uniting Ambition
skills — and a passion for building better together Nice to Have (We’ll Support Learning Too) Frontend development experience (especially with Angular) Experience with Kubernetes, Docker, GitHub Actions, or Datadog Familiarity with BDD (Gherkin, SpecFlow), observability tooling, and secure development practices Experience working in highly regulated or enterprise-scale environments What’s In It for You Be at the forefront More ❯
roles 3+ years' experience with an object-oriented language (preferably Java, .NET or C++) Expert+ level Linux administration, scripting, and troubleshooting Demonstrable knowledge of Observability tools (New Relic, Splunk, DataDog) Comprehensive experience with AWS (Amazon Web Services) and its core capabilities (VPC, EC2, ECS, Route53, Fargate, ALB/NLB distributions, etc) Extensive experience with cloud automation and infrastructure-as-code More ❯
roads to help teams get their apps up and running quickly in a consistent manner Event-Driven: We share data through an event-driven system powered by MSK Observability: Datadog is used for comprehensive logging and monitoring Databases: We use a combination of MongoDB and AWS Relational Databases Automation and CICD: Deployments are highly automated using Jenkins pipelines and Github More ❯
TypeScript for Frontend. Our backend services are written in TypeScript and Kotlin. Frameworks and Libraries: We use React/Redux and WebAssembly. Monitoring and Logging: We are currently using Datadog for monitoring and logging. Metrics are collected across our agents, taken from the logs using metric filters, and updated directly from lambda function or the application. Infrastructure-as-Code: Most More ❯
Software Practices - Champion SOLID principles, automated testing, and CI/CD best practices, ensuring deployments are fast, safe, and seamless. Observability & Performance - Help teams build highly observable systems using DataDog and other tools, making sure we can see and fix issues before they impact users. Collaboration - Work closely with Product Managers, Senior Product Directors, Delivery Managers, and Engineering Managers to More ❯
Reigate, Surrey, England, United Kingdom Hybrid / WFH Options
Client Server Ltd
in Azure (will also consider AWS or GCP experience) You have a deep understanding of cloud infrastructure and services including best practices around monitoring, scaling and security tools e.g. DataDog You have strong scripting skills with PowerShell (or Python) You have a good knowledge of basic networking, TCP/IP You have a good understanding of IaC, they use Pulumi More ❯
Reigate, Surrey, England, United Kingdom Hybrid / WFH Options
Client Server Ltd
in Azure (will also consider AWS or GCP experience) You have a deep understanding of cloud infrastructure and services including best practices around monitoring, scaling and security tools e.g. DataDog You have strong scripting skills with PowerShell (or Python) You have a good knowledge of basic networking, TCP/IP You have a good understanding of IaC, they use Pulumi More ❯
stage environments preferred. Nice to Have: Experience scaling engineering orgs across multiple geographies or domains (e.g., front-end, back-end, infrastructure). Familiarity with tools like Linear, Asana, GitHub, Datadog, DORA metrics, or similar performance/observability platforms. Background in organisational change management or engineering program management. What you can expect from us Competitive salary with substantial incentive schemes Generous More ❯