AWS, GCP, or Azure). Strong understanding of Site Reliability Engineering (SRE) practices and principles. Experience with observability and monitoring tools such as Prometheus, Grafana, ELK, Splunk, or Datadog. Familiarity with containerization (Docker, Kubernetes) and infrastructure as code (Terraform, CloudFormation) is a plus. Excellent problem-solving, debugging, and communication skills. More ❯
native deployment strategies. Hands-on with AWS, GCP, and Azure for compute, networking, and storage configurations. Familiarity with monitoring/logging tools (e.g., Prometheus, Grafana, ELK stack). Trading Systems & Finance: Solid understanding of trading infrastructure, latency optimization, execution systems, and market data feeds. Experience working in or with quantitative More ❯
and manage containerized applications using Docker, supporting streamlined deployment and environment consistency across development and production. Implement comprehensive monitoring and alerting solutions with Prometheus, Grafana, and AlertManager to proactively identify and resolve system performance issues. Champion DevOps best practices in automation, security, and agile delivery to drive continuous improvement, operational More ❯
cloud environments. Experience with CI/CD pipelines and tools (e.g., Jenkins, GitLab CI, CircleCI). Familiarity with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack). Strong problem-solving and analytical skills. Excellent communication and collaboration skills. Experience with version control systems (e.g., Git). Experience working in More ❯
highly regulated sectors Familiarity with Apache Kafka, Spark, or Hadoop Experience with Docker and Kubernetes Use of monitoring/alerting tools such as Prometheus, Grafana, or ELK Understanding of machine learning algorithms and data science workflows Proven ability to deliver end-to-end data solutions Knowledge of Terraform, Ansible, or More ❯
platform. Develop software using technologies such as Docker Compose, Terraform, Kubernetes (K8s), Python, and Go. Provision and orchestrate open-source services including Loki, Redis, Grafana, Authentik, Netbird, among others. Design and implement CI/CD pipelines to streamline deployment processes. Initially focus on AWS environments, with the goal of creating More ❯
such as Terraform Has a strong understanding of networking (VPCs, SGs etc), firewalls, VPN, and DNS Is familiar with monitoring tools such as Sentry, Grafana and CloudWatch Has experience with version control systems (e.g. GitLab) and best practices for branching, merging, and versioning Has extensive coding experiencing using Python, Javascript More ❯
to navigate through CNI and CSI configurations/issues. Experience with Helm charts and Operators, Hands on experience in containerizing applications - elastic search, PostgreSQL, Grafana, etc Strong Linux skills including basic administration skills. Ansible and Terraform experience. Strong Kubernetes experience, we want to see people who have worked on Kubernetes More ❯
pipelines with GitLab and GitHub Actions Containerising with Docker and applying best practices for security and performance Monitoring and alerting using Datadog, Prometheus, and Grafana Debugging complex systems using tools like strace, dtrace, and beyond Supporting a tech stack that includes Rust, Python, Go, C++, Java, and more 🧠 What You More ❯
for container orchestration. · Strong knowledge of Git and version control practices · Experience with Kafka, RabbitMQ, or similar technologies. · Familiarity with monitoring tools (eg - Prometheus, Grafana) and logging frameworks · Knowledge of DevOps principles and practices. · Strong analytical and problem-solving skills. · Experience working in Agile/Scrum environments. · Ability to work More ❯
Management and DevOps Pipelines and AWS including EKS Lambda and CloudFormation Infrastructure as Code and GitOps : Terraform Bicep Pulumi ArgoCD and FluxCD Observability : Prometheus Grafana OpenTelemetry and Datadog Security and Compliance : HashiCorp Vault Azure Key Vault AWS KMS OPA Gatekeeper and Drata or similar AI Coding Tools : GitHub Copilot Cursor More ❯
RabbitMQ, Kafka). Deep understanding of API design and best practices (REST, gRPC). Experience with CI/CD pipelines, monitoring tools (e.g., Prometheus, Grafana), and logging systems (e.g., ELK stack).Strong problem-solving, organizational, and communication ski lls. Prefe rred: Experience with distributed systems, event-driven architectures, and CQRS More ❯
on experience with AWS, Kubernetes, Docker, and modern CI/CD pipelines Familiarity with infrastructure-as-code (e.g., Terraform) and observability tooling (e.g., Prometheus, Grafana) Comfortable working on distributed systems and improving developer workflows A product mindset and a collaborative approach to problem-solving Experience with Kafka, gRPC, or open More ❯
platforms (e.g., AWS, GCP or Azure), an understanding of containerisation (e.g., Docker), infrastructure-as-code software (e.g., Terraform), and observability platforms (e.g., Datadog or Grafana). Curiosity : A hunger to learn and grow your skills. Problem solving: Strong analytical problem-solving skills and attention to detail. You have the ability More ❯
container runtimes (e.g., Singularity, Apptainer). Exposure to provisioning and automation tools (e.g., Ansible, PXE, Terraform). Experience with monitoring tools such as Prometheus, Grafana, and DCGM. Understanding of GPU/accelerator toolchains like CUDA or ROCm. A proactive, customer-first mindset with strong communication skills. Ability to work effectively More ❯
team environment (3 days a week onsite in London) Experience with Terraform, Kubernetes, or CI/CD pipelines Familiarity with observability tooling (e.g. Prometheus, Grafana, Datadog) Experience mentoring or leading other engineers More ❯
a high-scale, distributed environment - this could be a great next step. What You’ll Be Doing: Managing and improving observability tools like Prometheus, Grafana, and CloudWatch Helping product teams with tracing and monitoring to improve performance and reliability Defining and improving SLIs/SLOs , automating tasks, and reducing operational More ❯
/CD pipelines and GitOps principles Knowledge of container orchestration platforms like Kubernetes (EKS) Experience with monitoring and observability tools including OpenSearch, Prometheus, and Grafana Understanding of security best practices and AWS CIS Benchmark standards Experience with low-latency network design and optimization Strong verbal communication and documentation skills Experience More ❯
LLMs, and multimodal systems Architecture: Microservices, RESTful APIs, async programming Infrastructure: Docker, Terraform, GitHub Actions, GCP (preferred) Datastores: MongoDB, Redis Monitoring/Tooling: Prometheus, Grafana, Sentry The role is remote with occasional travel Ready to lead and build with purpose? If you're excited by the idea of applying your More ❯
code (we use Pulumi) Relational databases such as MySQL/PostgreSQL Proficiency in writing and maintaining test suites Monitoring and observability tools, for example Grafana/Crashlytics What we offer A competitive salary and benefits package (depending on experience). Holidays: 32 days paid leave including public holidays. Pension contribution More ❯
flows and integration processes. Familiarity with containerization and orchestration tools such as Docker and Kubernetes. Experience with monitoring and alerting tools such as Prometheus, Grafana, or ELK for data infrastructure Knowledge of security practices for handling sensitive data, including encryption, anonymization, and access control. Familiarity with data governance, data quality More ❯
hands-on migration work, Kubernetes Security Spark S3 Engine Terraform Ansible CI/CD Hadoop Linux/RHEL – on prem background/container management Grafana or Elastic Search– for observability Desirable: experience with Open Telemetry and/or Argo This is an urgent requirement - suitable candidates should apply ASAP for More ❯
Cloud Architect, or similar Must have experience with Kubernetes, CI/CD, and Terraform Database experience with MySQL and NoSQL Monitoring tools like Prometheous, Grafana, NewRelic, Zabbix, DataDog, or Similar Relevant Certifications like GCP Cloud Architect or similar Beneficial, but not essential: Cassandra, and Google Workspace What's in it More ❯
Cardiff, South Glamorgan, United Kingdom Hybrid / WFH Options
Starling Bank
native Microservice based architecture Kubernetes (EKS) TeamCity for CI/CD (lots of teams are releasing code 15-20 times per day!) Terraform and Grafana Our process: Interviewing is a two-way process and we want you to have the time and opportunity to get to know us, as much More ❯