and manage containerized applications using Docker, supporting streamlined deployment and environment consistency across development and production. Implement comprehensive monitoring and alerting solutions with Prometheus, Grafana, and AlertManager to proactively identify and resolve system performance issues. Champion DevOps best practices in automation, security, and agile delivery to drive continuous improvement, operational More ❯
cloud environments. Experience with CI/CD pipelines and tools (e.g., Jenkins, GitLab CI, CircleCI). Familiarity with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack). Strong problem-solving and analytical skills. Excellent communication and collaboration skills. Experience with version control systems (e.g., Git). Experience working in More ❯
platform. Develop software using technologies such as Docker Compose, Terraform, Kubernetes (K8s), Python, and Go. Provision and orchestrate open-source services including Loki, Redis, Grafana, Authentik, Netbird, among others. Design and implement CI/CD pipelines to streamline deployment processes. Initially focus on AWS environments, with the goal of creating More ❯
such as Terraform Has a strong understanding of networking (VPCs, SGs etc), firewalls, VPN, and DNS Is familiar with monitoring tools such as Sentry, Grafana and CloudWatch Has experience with version control systems (e.g. GitLab) and best practices for branching, merging, and versioning Has extensive coding experiencing using Python, Javascript More ❯
pipelines with GitLab and GitHub Actions Containerising with Docker and applying best practices for security and performance Monitoring and alerting using Datadog, Prometheus, and Grafana Debugging complex systems using tools like strace, dtrace, and beyond Supporting a tech stack that includes Rust, Python, Go, C++, Java, and more 🧠 What You More ❯
a high-scale, distributed environment - this could be a great next step. What You’ll Be Doing: Managing and improving observability tools like Prometheus, Grafana, and CloudWatch Helping product teams with tracing and monitoring to improve performance and reliability Defining and improving SLIs/SLOs , automating tasks, and reducing operational More ❯
LLMs, and multimodal systems Architecture: Microservices, RESTful APIs, async programming Infrastructure: Docker, Terraform, GitHub Actions, GCP (preferred) Datastores: MongoDB, Redis Monitoring/Tooling: Prometheus, Grafana, Sentry The role is remote with occasional travel Ready to lead and build with purpose? If you're excited by the idea of applying your More ❯
container runtimes (e.g., Singularity, Apptainer). Exposure to provisioning and automation tools (e.g., Ansible, PXE, Terraform). Experience with monitoring tools such as Prometheus, Grafana, and DCGM. Understanding of GPU/accelerator toolchains like CUDA or ROCm. A proactive, customer-first mindset with strong communication skills. Ability to work effectively More ❯
code (we use Pulumi) Relational databases such as MySQL/PostgreSQL Proficiency in writing and maintaining test suites Monitoring and observability tools, for example Grafana/Crashlytics What we offer A competitive salary and benefits package (depending on experience). Holidays: 32 days paid leave including public holidays. Pension contribution More ❯
Cardiff, South Glamorgan, United Kingdom Hybrid / WFH Options
Starling Bank
native Microservice based architecture Kubernetes (EKS) TeamCity for CI/CD (lots of teams are releasing code 15-20 times per day!) Terraform and Grafana Our process: Interviewing is a two-way process and we want you to have the time and opportunity to get to know us, as much More ❯
native Microservice based architecture Kubernetes (EKS) TeamCity for CI/CD (lots of team are releasing code 15-20 times per day!) Terraform and Grafana Our process: Interviewing is a two way process and we want you to have the time and opportunity to get to know us, as much More ❯
Fargate). Driving SRE best practices: SLIs/SLOs, error budgets, reducing toil, and improving observability. Using (and hopefully enjoying!) tools like Datadog, Prometheus, Grafana, and Nix to support your work. What we’re looking for: Strong experience with AWS, Terraform, Docker, and container orchestration (ECS/Fargate). Good … understanding of CI/CD pipelines and DevOps workflows. Solid grasp of SRE principles – SLIs, SLOs, error budgets, observability, etc. Familiarity with Datadog, Prometheus, Grafana, or similar tools. Experience with Nix is a plus (or curiosity to learn it). Bonus if you’ve worked with Azure, GCP, or have More ❯