City of London, London, United Kingdom Hybrid / WFH Options
Pathmere Partners Limited
Bash) Financial services or FinTech background is a plus but not essential Tech Stack Includes: Microsoft Azure Terraform, Bicep, ARM templates Docker, Kubernetes (AKS) Azure DevOps, GitHub Actions Helm, Prometheus, Grafana, App Insights PowerShell, Bash Benefits Competitive base salary (£90,000£110,000) Annual performance bonus Private medical insurance Pension scheme and flexible benefits Clear career path to Head of More ❯
and version control systems (e.g., Git). Excellent problem-solving skills and attention to detail. Strong communication and teamwork abilities. Preferred Qualifications: Experience with monitoring and logging tools (e.g., Prometheus, Grafana, ELK Stack). Familiarity with Agile methodologies and DevOps practices. Enhanced leave - 38 days inclusive of 8 UK Public Holidays Private Health Care including family cover Life Assurance - 5x More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
BOSS Professional Services LTD
grow the customer base and product offering. For the SRE Engineer role we are seeking: Technology stack: Kubernetes, MySQL, PostgreSQL, PHP, Python, Docker, AWS Lambda, AWS, Redis, ELK, monitoring: Prometheus, Grafana or Loki You have previous experience of working within SRE capacity or experience in DevOps and interest in moving into that field. Be responsible for the production environment. Improve More ❯
and their implementation via code. Ability to lead and mentor teams on secure coding, threat modelling, and secure architecture patterns. Experience with monitoring, logging, and security telemetry platforms (e.g., Prometheus, Loki, ELK, XDR/SIEM integrations). Please apply should you meet the above criteria Attenti Consulting is acting as an Employment Business in relation to this vacancy. More ❯
Databases - Postgres, MariaDB, MongoDB, ClickHouse, Redis, JupyterLab, Metabase Data Engineering & Orchestration - Python, Airflow, Kafka, DataHub Cloud & Infrastructure - AWS, K8s DevOps & CI/CD - Git, GitLab CI, DBS, Grafana, ELK, Prometheus, Docker, Docker Compose Why join us? Shape the future of a data business at the forefront of global payments insights A chance to work with a vibrant, friendly team in More ❯
such as Oracle SQL, Mongo, Postgres o Know your way around Linux and Windows command lines, e.g. Bash and PowerShell o Monitoring large systems using technologies such as Grafana, Prometheus, ELK, Splunk o Experience of working in Agile teams, and the tooling that supports it, e.g. Atlassian o Diagnosing and troubleshooting application issues resulting in service outages o Troubleshooting skills More ❯
applications and infrastructure Other duties as needed About You 5+ years' experience in Site Reliability Engineer roles Expert+ level Linux administration, scripting, and troubleshooting Demonstrable knowledge of Observability tools (Prometheus/Grafana, New Relic, Splunk, DataDog) Comprehensive experience with AWS (Amazon Web Services) and its core capabilities (VPC, EC2, ECS, Route53, Fargate, ALB/NLB distributions, etc) Extensive experience with More ❯
environments for sports. Strong understanding of video streaming protocols, encoding/transcoding workflows. Demonstrated ability to lead technical recovery during high-pressure incidents Familiarity with observability tools (e.g., Grafana, Prometheus, Datadog) and incident management platforms (e.g., PagerDuty, Opsgenie). Excellent communication and stakeholder management skills. Strong analytical and problem-solving abilities. What's in it For You? Hybrid Work Model More ❯
City of London, London, England, United Kingdom Hybrid / WFH Options
TalentTrade Recruitment Limited
and secrets management. Good experience with continuous integration and continuous deployment (CI/CD) pipelines with GitHub Actions. Familiarity with monitoring and logging tools relevant to distributed systems (eg, Prometheus, Grafana, ELK stack). Experience with Scripting languages such as Bash or Python for automation tasks. More ❯
EC3N, Tower, Greater London, United Kingdom Hybrid / WFH Options
TalentTrade Recruitment Limited
and secrets management. Good experience with continuous integration and continuous deployment (CI/CD) pipelines with GitHub Actions. Familiarity with monitoring and logging tools relevant to distributed systems (eg, Prometheus, Grafana, ELK stack). Experience with Scripting languages such as Bash or Python for automation tasks. More ❯
communication concepts and protocols - TCP/IP, DNS, HTTP Experience of deploying Continuous Integration solutions An awareness of security considerations in web application deployment Monitoring/Logging aka ELK, Prometheus/Grafana etc Strong AWS knowledge - EC2, EKS, RDS, Aurora, networking, cost management If you'd like to discuss this DevOps Engineer in more detail, please send your updated CV More ❯
native applications Working in a Continuous Delivery environment Modern observability practices Nice to have Not vital, but you'll have the edge if you also have experience with: Grafana Prometheus Kotlin or a least the willingness to learn it Batch processing data pipelines or have worked in: an eCommerce organisation a shipping/logistics/exports organisation What you bring More ❯
operating infrastructure on AWS and other providers Operating MongoDB (or other document database) clusters Operating Redis (or other key-value storage) clusters Administering Linux servers Maintaining distributed software Operating Prometheus and Grafana Operating logging collection and analysis systems Participating in the on-call rotation(4:00am - 16:00pm UTC) Skills: Kubernetes & containers (advanced) AWS/EKS (advanced) Linux (advanced) Terraform … and IaC in general (proficient) Helm (proficient) Go and/or Python (familiar) MongoDB (or similar) Redis (or similar) Monitoring - prometheus, grafana, thanos (familiar) Grasp of networking concepts (subnets, routing, peering, load balancing, NAT, etc.) Common networking protocols (DNS, TCP/IP, HTTP, TLS, UDP) Proactive, energetic, innovative and change oriented Nice to have: GCP or Azure Bare metal infrastructure More ❯
operating infrastructure on AWS and other providers Operating MongoDB (or other document database) clusters Operating Redis (or other key-value storage) clusters Administering Linux servers Maintaining distributed software Operating Prometheus and Grafana Operating logging collection and analysis systems Working hours within 16:00pm - 4:00am UTC Skills: Kubernetes & containers (advanced) AWS/EKS (advanced) Linux (advanced) Terraform and IaC in … general (proficient) Helm (proficient) Go and/or Python (familiar) MongoDB (or similar) Redis (or similar) Monitoring - prometheus, grafana, thanos (familiar) Grasp of networking concepts (subnets, routing, peering, load balancing, NAT, etc.) Common networking protocols (DNS, TCP/IP, HTTP, TLS, UDP) Proactive, energetic, innovative and change oriented Nice to have: GCP or Azure Bare metal infrastructure engineering API management More ❯
LLM Deployment & Fine Tuning: Drive the deployment and fine tuning of large language models (LLMs) while ensuring efficient training pipelines and model hosting. Monitoring & Performance Optimization: Implement monitoring (using Prometheus/Grafana and similar tools) and logging solutions to ensure system reliability and to optimise model throughput. Collaborate Across Teams: Work closely with Machine Learning engineers to enable their delivery … Operations: Hands-on experience with training pipelines, model hosting, and throughput optimisation. Expertise in deploying and fine tuning large language models. Monitoring & Performance: Proficiency with monitoring tools such as Prometheus and Grafana. Programming & Automation: Strong proficiency in Python, with experience in developing production applications. Data Engineering & Streaming: Familiarity with data streaming tools and Elastic to ensure high performance in data More ❯
relevant tools. Security Best Practices: IAM, MFA, data encryption, firewall configurations. Programming/Scripting: Python, Terraform, or similar languages. Event-Driven Architectures: Kafka. Monitoring and Logging: Datadog, ELK Stack, Prometheus, etc. Experience in agile methodologies and DevOps practices. Location: Hybrid. Office located in London. (Hayes area). Office presence required: Yes. Frequency: 2-3 times a week at the office. More ❯
Programming languages, such as C#, Python, Perl, Java, C++ CICD tools such as Azure DevOps, GitHub Actions, Gitlab, Jenkins, TeamCity Scripting languages such as PowerShell, bash Observability/Monitoring: Prometheus, Grafana, Splunk Containerisation tools such as Docker, K8S, OpenShift, EC, containers Hosting technologies such as IIS, nginx, Apache, App Service, LightSail Analytical and creative approach to problem solving We encourage More ❯
and monitoring tools Triaging production issues Performance tuning of JVM apps Nice to have Not vital, but you'll have the edge if you also have experience with: Kotlin Prometheus Query Language (PromQL) Grafana Prometheus or have worked in: an eCommerce organisation a shipping/logistics/exports organisation What you bring Agile: Test-Driven Development, collaboration and continuous delivery More ❯
with our Product & Sales team to make sure they understand any risk that may occur during protocol deployment. Stack: Infrastructure: AWS/GCP + baremetal, Kubernetes, Terraform/Terragrunt, Prometheus/Thanos, Helm, Hashicorp Vault, FluxCD Software: Golang, Typescript, PostgreSQL Smart-Contract: Solidity, Foundry, OpenZeppelin Requirements: +5 years of background experience in Software or Infrastructure , within a high standard. engineering … or Crypto. Proven experience as a Senior SRE with a very strong focus on Kubernetes. Proficiency with IaC (Terraform/Terragrunt) and infrastructure automation (Helm, GitOps). Familiar with Prometheus and PromQL Familiar with infrastructure and data security (KMS, Hashicorp Vault). Ability to ship opinionated architectural choices and code, and to share software best practices. All our written and More ❯
as Azure, AWS or GCP Proficiency using Infrastructure as Code (IaC) tools such as Terraform (preferred), Ansible, or CloudFormation. Experience with monitoring, observability and logging tools such as DataDog, Prometheus, Grafana, or similar. Proven track record of maintaining highly-available and performant production environments. Ability to identify and implement effective mitigation strategies and operational playbooks. Useful/Bonus Skills to More ❯
applications and infrastructure Other duties as needed About You 5+ years' experience in Site Reliability Engineer roles Expert+ level Linux administration, scripting, and troubleshooting Demonstrable knowledge of Observability tools (Prometheus/Grafana, New Relic, Splunk, DataDog) Comprehensive experience with AWS (Amazon Web Services) and its core capabilities (VPC, EC2, ECS, Route53, Fargate, ALB/NLB distributions, etc) Extensive experience with More ❯
make a move? Get in touch and apply today! Responsibilities: Respond rapidly to critical AWS incidents, identify root causes, and deploy automated hotfixes. Lead the setup and integration of Prometheus-Grafana observability stack. Refactor and modernize deployment pipelines using GitHub Actions and Kubernetes. Maintain robust monitoring, alerting, and CI/CD systems. Skills/Must have: Strong hands-on experience … with AWS (eg EC2, EKS, CloudWatch, Lambda). Background in incident, change, and problem management; comfortable with on-call rotations. Expertise in Prometheus, Grafana, and Splunk; solid knowledge of PromQL. Proficient in Scripting/programming (Python, Go, Bash, SQL). Salary: £500 per day More ❯
healing systems etc.) Database administration Infrastructure provisioning Process automation Respond to change requests Skills & Experience Oracle DB Docker (with Docker Swarm) Elastic Stack Typescript/React/Node Go Prometheus/Grafana ESRI Maps Ansible Windows & Linux Jenkins Automation skills: Automation is a key skill domain for this role. Specific automation skills are: Continuous Integration - Skilled in the tooling and More ❯
technologies provided by GCP/AWS, such as S3, FSX, EKS, SQS, SNS, Kinesis, AmazonMQ, DynamoDB, GKE, CloudStorage, PubSub, Filestore, Knowledge of modern observability technologies such as ELK, Splunk, Prometheus, Grafana, Micrometer "What-if" thinking, while designing or reviewing solutions, to foresee or catch potential problems as early in the development process, as only possible Nice to have: Good knowledge More ❯