such as Azure, AWS or GCP Proficiency using Infrastructure as Code (IaC) tools such as Terraform (preferred), Ansible, or CloudFormation. Experience with monitoring, observability and logging tools such as DataDog, Prometheus, Grafana, or similar. Proven track record of maintaining highly-available and performant production environments. Ability to identify and implement effective mitigation strategies and operational playbooks. Useful/Bonus Skills More ❯
and other relevant tools. Security Best Practices: IAM, MFA, data encryption, firewall configurations. Programming/Scripting: Python, Terraform, or similar languages. Event-Driven Architectures: Kafka. Monitoring and Logging: Datadog, ELK Stack, Prometheus, etc. Experience in agile methodologies and DevOps practices. Location: Hybrid. Office located in London. (Hayes area). Office presence required: Yes. Frequency: 2-3 times a week at More ❯
Proficiency in scripting and automation using Python, Bash, or Go. Experience with Infrastructure as Code (Terraform, CloudFormation, or Ansible). Familiarity with monitoring, logging, and observability tools (Prometheus, Grafana, Datadog, ELK, etc.). Strong understanding of networking concepts (VPC, Load Balancers, DNS, Firewalls). Experience with DevOps methodologies, CI/CD pipelines, and GitOps practices. Experience with high-performance and More ❯
Gloucester, Gloucestershire, United Kingdom Hybrid / WFH Options
Navtech, Inc
Liquibase) and Git for version control. Scripting & Troubleshooting: Strong scripting skills (Python/Bash) for automation and ability to analyze logs and monitor performance using tools like AWS Cloudwatch, Datadog, Prometheus, Grafana, or pgBadger. Solid understanding of DevOps practices, including CI/CD pipelines (e.g., GitLab CI, Cloudbees, Jenkins, GitHub Actions), containerization with Docker, and monitoring/logging tools. Demonstrated More ❯
Cardiff, South Glamorgan, United Kingdom Hybrid / WFH Options
Navtech, Inc
Liquibase) and Git for version control. Scripting & Troubleshooting: Strong scripting skills (Python/Bash) for automation and ability to analyze logs and monitor performance using tools like AWS Cloudwatch, Datadog, Prometheus, Grafana, or pgBadger. Solid understanding of DevOps practices, including CI/CD pipelines (e.g., GitLab CI, Cloudbees, Jenkins, GitHub Actions), containerization with Docker, and monitoring/logging tools. Demonstrated More ❯
tools and container orchestration (Docker, ECS, or Kubernetes) Solid understanding of system/network security, IAM, VPC, and secure cloud configurations Familiarity with monitoring and logging tools (e.g., CloudWatch, Datadog, Prometheus, Sentry) Experience with Postgres, Redis, and scalable backend systems Bonus: Exposure to fintech or regulated environments, GDPR/data compliance, or SOC2 setup A little about us Our founders More ❯
needed About You 5+ years' experience in Site Reliability Engineer roles Expert+ level Linux administration, scripting, and troubleshooting Demonstrable knowledge of Observability tools (Prometheus/Grafana, New Relic, Splunk, DataDog) Comprehensive experience with AWS (Amazon Web Services) and its core capabilities (VPC, EC2, ECS, Route53, Fargate, ALB/NLB distributions, etc) Extensive experience with cloud automation and infrastructure-as-code More ❯
needed About You 5+ years' experience in Site Reliability Engineer roles Expert+ level Linux administration, scripting, and troubleshooting Demonstrable knowledge of Observability tools (Prometheus/Grafana, New Relic, Splunk, DataDog) Comprehensive experience with AWS (Amazon Web Services) and its core capabilities (VPC, EC2, ECS, Route53, Fargate, ALB/NLB distributions, etc) Extensive experience with cloud automation and infrastructure-as-code More ❯
/CD tools such as GitlabCI, CircleCI, Github Actions, and GitOps using ArgoCD, FluxCD Troubleshooting and debugging applications using Observability tooling across microservices and serverless applications such as Splunk, DataDog Managing ephemeral secrets and credentials using Hashicorp Vault Managing least privileged access to cloud resources using TPAM solutions such as Hashicorp Boundary Bonus Points for experience with: Production experience architecting More ❯
roles 3+ years' experience with an object-oriented language (preferably Java, .NET or C++) Expert+ level Linux administration, scripting, and troubleshooting Demonstrable knowledge of Observability tools (New Relic, Splunk, DataDog) Comprehensive experience with AWS (Amazon Web Services) and its core capabilities (VPC, EC2, ECS, Route53, Fargate, ALB/NLB distributions, etc) Extensive experience with cloud automation and infrastructure-as-code More ❯
testing Tech Stack Backend: Python, FastAPI, SQLAlchemy, Postgres, Snowflake Frontend: React (Next.js), TypeScript (approx. 25% of the work) Cloud : AWS, Kubernetes (EKS), Terraform, GitLab CI/CD, ArgoCD Monitoring: Datadog Experience 5+ years of experience as a backend, full-stack, or DevOps engineer Strong backend development skills in Python or similar languages Experience designing APIs and distributed systems Proficiency with More ❯
Watford, Hertfordshire, United Kingdom Hybrid / WFH Options
Wickes
You'll have a deep understanding of modern cloud ecosystems, with extensive hands-on experience in Amazon Web Services (AWS). Familiarity with modern observability concepts and tools, including Datadog, and proven experience with the "platform as a product" model and driving adoption of internal tools. Strong familiarity with CI/CD principles and pipelines (e.g., Jenkins, GitLab CI, CircleCI More ❯
PowerShell with other scripting languages like Python or Bash a bonus Awareness of configuration tools like Flux and Terraform Experience monitoring large distributed systems using technologies such as ELK, Datadog, Prometheus and tooling provided by cloud platform vendors Awareness and interest in technology trends to adopt new cutting-edge tools Building, managing, and securing C# ASP.Net web applications Excellent communication More ❯
as needed. Experience with relational and non-relational databases. Experience delivering high levels of observability and proficiency in improving early warning systems, for example: has worked with Grafana/DataDog/Prometheus. Collaborating with internal/external teams/engineers and fostering an inclusive environment, where all points of view are welcomed and encouraged. Own and lead multiple domains of More ❯
and optimize CI/CD pipelines using Azure DevOps, GitHub Actions, or Jenkins. Automate everything with Terraform, Bicep, and scripting (PowerShell, Bash, Python). Drive observability with tools like Datadog, LogicMonitor, CloudWatch, and Grafana. Champion cloud security, IAM, RBAC, and compliance best practices. Collaborate across teams, mentor peers, and contribute to a culture of continuous improvement. What You Bring: Proven More ❯
Cardiff, South Glamorgan, United Kingdom Hybrid / WFH Options
Principality Building Society
on-premise infrastructure models. Working knowledge of secure SDLC practices and non-functional testing requirements (e.g. resilience, availability, performance, security). Experience with monitoring, logging, and observability tooling (e.g. Datadog, App Insights). Knowledge of Agile principles and DevOps practices. Experience working in platform or enablement teams and using flow metrics to improve delivery. What You'll Bring: A strong More ❯
of resource allocation, network and/or internals. Experience working with cloud solutions (GCP or AWS). Deep understanding and demonstrable experience with modern monitoring tools such as Prometheus, Datadog, Grafana, Telegraf Experience with infrastructure as code tools. Experience with complex Terraform deployments is a plus. Solid background with configuration management tools. Experience with Saltstack is a plus. Experience with More ❯
roads to help teams get their apps up and running quickly in a consistent manner Event-Driven: We share data through an event-driven system powered by MSK Observability: Datadog is used for comprehensive logging and monitoring Databases: We use a combination of MongoDB and AWS Relational Databases Automation and CICD: Deployments are highly automated using Jenkins pipelines and Github More ❯
of the React Framework, relative patterns and best practices. Good understanding of UI/UX best practices and considerations. Understanding of front-end observability with tools like Sentry, LogRocket, Datadog, or New Relic. Experience with CI/CD pipelines, like Github Actions, ArgoCD. Awareness of common front-end security risks (e.g., XSS, CSRF). Passion for writing clean, modular, scalable More ❯
Who we are We are a London tech startup on the lookout for bright, motivated and self-driven individuals to join the team. Who you are You are a DevOps/Site Reliability Engineer with experience managing complex infrastructure and More ❯
for UK National Security Vetting Desirable Skills Experience with AWS, Azure or GCP Knowledge of database systems and models Familiarity with test automation frameworks and monitoring tools such as DataDog or Prometheus Exposure to a range of open-source technologies Benefits Technical progression opportunities without the need to move into sales or management Learning allowance and paid overtime Hybrid working More ❯
years of professional experience, some of which should have focus on Observability. Excellent knowledge and hands-on experience with monitoring, logging, and tracing tools such as Prometheus, VictoriaMetrics, Grafana, Datadog, New Relic, OpenTelemetry, ELK Stack, or similar. Experience with high volume data storage (Structured and unstructured). A strong technical background, with current capabilities and willingness to get hands on More ❯
stack; Experience with AWS Cloud services; Experience with Bash or Python scripting; Experience with CI/CD tools such as Gitlab CI; Familiar with application performance monitoring tools like Datadog, New Relic; Familiar with Docker orchestrators such as Amazon ECS or Kubernetes; Familiar with Git; Ability to solve issues with clear methods while knowing when to take intuitive leaps. Nice More ❯