documentation for system behavior, runbooks, and escalation flows. Tech Stack & Tooling Languages: Python (primary), Bash, T-SQL OS/Infrastructure: Linux, Windows, Docker, AWS Cloud services Monitoring & Alerting: DataDog, Grafana, custom tooling Automation/CI/CD: Git, TeamCity, Ansible, Terraform (optional) Databases: MS SQL Server, Snowflake General Any other duties commensurate with the post holder's position and seniority More ❯
Management and Automation Proficiency with containers and container orchestration including Helm, Docker and Kubernetes Observability experience of designing and building monitoring and logging tools such as CloudWatch, ELK, and Grafana Basic programming skills in at least one language Experience with CI/CD pipeline development and management Due to the industries we work in, we require the successful candidate to More ❯
infrastructure Other duties as needed About You 5+ years' experience in Site Reliability Engineer roles Expert+ level Linux administration, scripting, and troubleshooting Demonstrable knowledge of Observability tools (Prometheus/Grafana, New Relic, Splunk, DataDog) Comprehensive experience with AWS (Amazon Web Services) and its core capabilities (VPC, EC2, ECS, Route53, Fargate, ALB/NLB distributions, etc) Extensive experience with cloud automation More ❯
language (Python, Bash, etc.). Familiarity with containerization and orchestration tools (Kubernetes). Exposure to infrastructure as code (Terraform) concepts. Familiarity with monitoring, logging, and security tools (e.g., Prometheus, Grafana, Splunk, BQL). Experience supporting either Windows or Linux environments. Cyber Security: Basic understanding of cyber security principles and best practices. Interest in learning about and working with secrets management More ❯
etc.) Database administration Infrastructure provisioning Process automation Respond to change requests Skills & Experience Oracle DB Docker (with Docker Swarm) Elastic Stack Typescript/React/Node Go Prometheus/Grafana ESRI Maps Ansible Windows & Linux Jenkins Automation skills: Automation is a key skill domain for this role. Specific automation skills are: Continuous Integration - Skilled in the tooling and principles of More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Pathmere Partners Limited
Financial services or FinTech background is a plus but not essential Tech Stack Includes: Microsoft Azure Terraform, Bicep, ARM templates Docker, Kubernetes (AKS) Azure DevOps, GitHub Actions Helm, Prometheus, Grafana, App Insights PowerShell, Bash Benefits Competitive base salary (£90,000£110,000) Annual performance bonus Private medical insurance Pension scheme and flexible benefits Clear career path to Head of DevOps More ❯
Kubernetes, Docker, Helm Proficient in Terraform, CI/CD Pipelines (Drone/GitLab) Excellent understanding of Kafka internals, stream processing, and secure Kafka deployments Strong experience across monitoring (Prometheus, Grafana, CloudWatch) Knowledge of security hardening, IAM, WAF, Shield, Vault Working knowledge of Agile, Infrastructure-as-Code, and DevSecOps practices UK*C or Enhanced DV (eDV) Clearance is a must To More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Searchability NS&D
Kubernetes, Docker, Helm Proficient in Terraform, CI/CD Pipelines (Drone/GitLab) Excellent understanding of Kafka internals, stream processing, and secure Kafka deployments Strong experience across monitoring (Prometheus, Grafana, CloudWatch) Knowledge of security hardening, IAM, WAF, Shield, Vault Working knowledge of Agile, Infrastructure-as-Code, and DevSecOps practices UK*C or Enhanced DV (eDV) Clearance is a must To More ❯
AWS, Azure, or GCP, and their services for scalable, resilient systems. Expertise in containerization technologies (e.g., Docker, Kubernetes) and orchestration tools. Knowledge of monitoring and logging tools (e.g., Prometheus, Grafana, ELK Stack) for maintaining system health and performance. Ability to lead and mentor junior engineers in reliability and system optimization best practices. Excellent communication skills for effective collaboration with cross More ❯
provisioning and automation Deep understanding of CI/CD principles, GitOps, and automation Experience with containerization and orchestration tools (Docker, etc.) Familiarity with monitoring and logging tools (e.g., Prometheus, Grafana, Azure Monitor) Demonstrated experience in managing multiple Azure landing zones with enterprise-scale governance and policies Strong knowledge of Azure security services (e.g., Azure Security Center, Defender for Cloud, Key More ❯
business requirements Essential Requirements Specialist Knowledge: Demonstrable experience in observability engineering, infrastructure monitoring, or event management roles Experience with traditional and modern observability stacks such as SCOM, SolarWinds, Prometheus, Grafana and Elastic Stack (ELK) Hands-on experience with BMC Helix Operations Manager, TrueSight, or similar enterprise monitoring platforms Solid understanding of AIOps concepts, including event correlation, noise reduction, anomaly detection More ❯
one of the following cloud providers: AWS, Google, Microsoft Proficiency with: Terraform, Helm or Chef Networking basics (routing, firewalls, AWS security groups) Troubleshooting/analysis of applications: Splunk, appdynamics, grafana, etc OS performance troubleshooting and ability to install and configure operating system packages Familiarity with Oauth2 Security principles on patching, compliance, change control process Preferred Qualifications : Have expert Build/ More ❯
technologies such as Oracle SQL, Mongo, Postgres o Know your way around Linux and Windows command lines, e.g. Bash and PowerShell o Monitoring large systems using technologies such as Grafana, Prometheus, ELK, Splunk o Experience of working in Agile teams, and the tooling that supports it, e.g. Atlassian o Diagnosing and troubleshooting application issues resulting in service outages o Troubleshooting More ❯
Accounts - AWS Control Tower, GCP Resource Manager, etc. Network - AWS Transit Gateway, GCP Shared VPC, AWS Route53, GCP Cloud DNS, etc. Observability - AWS OpenSearch, GCP Monitoring/Traces, OpenTelemetry, Grafana, Prometheus, etc. Automation Prowess: Hands-on experience with modern Infrastructure as Code (IaC) automation tools and frameworks (Terraform, Jenkins, Ansible, etc.). Software Development Acumen: A software development background is More ❯
Node, RabbitMQ Databases - Postgres, MariaDB, MongoDB, ClickHouse, Redis, JupyterLab, Metabase Data Engineering & Orchestration - Python, Airflow, Kafka, DataHub Cloud & Infrastructure - AWS, K8s DevOps & CI/CD - Git, GitLab CI, DBS, Grafana, ELK, Prometheus, Docker, Docker Compose Why join us? Shape the future of a data business at the forefront of global payments insights A chance to work with a vibrant, friendly More ❯
applications and optimizing fleet utilization - Strong understanding of network fundamentals (DNS, DHCP, TCP/IP, routing, load balancing, load shedding) and experience with monitoring frameworks (such as CloudWatch, Datadog, Grafana, Elastic or similar) - Experience scripting operating system tasks in Bash, Python, etc. and with Infrastructure as Code, (such as CDK, CloudFormation, Puppet, Chef, Ansible, or similar) - Experience operating services in More ❯
Preferred: Exposure to Infrastructure-as-Code tools (Terraform or CloudFormation) Experience working with CI/CD pipelines and Git workflows Knowledge of logging and monitoring tools (e.g., CloudWatch, ELK, Grafana) Exposure to container technologies like Docker Interest in financial markets or experience with trading system infrastructure If you are interested and looking to be part of a high-impact technology More ❯
to work independently or lead a small team Nice to Have: Experience with TYK API Gateway Exposure to microservices and event-driven architectures Familiarity with observability tools (e.g., Prometheus, Grafana) Carbon60, Lorien & SRG - The Impellam Group STEM Portfolio are acting as an Employment Business in relation to this vacancy. More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
Lorien
to work independently or lead a small team Nice to Have: Experience with TYK API Gateway Exposure to microservices and event-driven architectures Familiarity with observability tools (e.g., Prometheus, Grafana) Carbon60, Lorien & SRG - The Impellam Group STEM Portfolio are acting as an Employment Business in relation to this vacancy. More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
BOSS Professional Services LTD
the customer base and product offering. For the SRE Engineer role we are seeking: Technology stack: Kubernetes, MySQL, PostgreSQL, PHP, Python, Docker, AWS Lambda, AWS, Redis, ELK, monitoring: Prometheus, Grafana or Loki You have previous experience of working within SRE capacity or experience in DevOps and interest in moving into that field. Be responsible for the production environment. Improve the More ❯
improvement Take pride in building and operating scalable, reliable, secure systems Are comfortable with ambiguity and rapid change Preferred skills and experience: Familiar with monitoring tools such as Prometheus, Grafana, or similar 5+ years building core infrastructure Experience running inference clusters at scale Experience operating orchestration systems such as Kubernetes at scale Benefits & perks (UK full-time employees): Generous PTO More ❯
IP, VLANs, routing). You will bring some of these skills, but more importantly you're interested in learning these things: • Hardware & physical infrastructure. • Data-driven monitoring and observability (Grafana, InfluxDB, Prometheus, Elastic). • Exposure to configuration management (Puppet, Ansible, Terraform). • Some exposure to scripting (Bash, Python). • Supporting CI/CD delivery pipelines (GitLab, GitHub). 25 days More ❯
MySQL (Aurora DB), Redis (ElastiCache), MongoDB (AWS DocumentDB) Cloud & DevOps: AWS (20+ services), Kubernetes (EKS), Docker, Infrastructure as Code(CloudFormation, Terraform), CI/CD (Jenkins,GitHub Actions), Observability(AWS, Grafana) Development tools: GitHub, Jira, Notion, ChatGPT,Gemini,LangChain, AI-native IDE's (Cursor, JetBrains), LLM-powered internal tools. WHAT WE OFFER YOU A front-row seat in a fast scaling More ❯