Network (LAN/WAN/SD-WAN, Wireless, Firewalls) Unified Communication/Voice/Collaboration (Cisco, MS Teams) Mobility & Endpoint Management (Intune, MDM/UEM) Observability and Monitoring (ELK, Prometheus, AppDynamics, etc.) End-User Computing (VDI, physical endpoints, OS lifecycle) End-User Services and Service Desk (ITSM, automation, FCR, CSAT) Serve as a trusted advisor to business and IT executives More ❯
City of London, London, United Kingdom Hybrid / WFH Options
UST
Network (LAN/WAN/SD-WAN, Wireless, Firewalls) Unified Communication/Voice/Collaboration (Cisco, MS Teams) Mobility & Endpoint Management (Intune, MDM/UEM) Observability and Monitoring (ELK, Prometheus, AppDynamics, etc.) End-User Computing (VDI, physical endpoints, OS lifecycle) End-User Services and Service Desk (ITSM, automation, FCR, CSAT) Serve as a trusted advisor to business and IT executives More ❯
and improving operational reliability. 8. Implement monitoring, alerting, and troubleshooting for data workflows: Set up real-time monitoring, logging, and alerting for ETL and AI components using tools like Prometheus; proactively diagnose issues and ensure system health. 9. Ensure data security, privacy, and compliance throughout the ETL process: Apply best practices for secure data handling, including encryption, access control, and More ❯
This is an office based role , you must be able to commute to and work in the City of London as a norm About Us Archax is an FCA-regulated exchange, broker and custodian for digital assets, targeted at professional More ❯
for CI/CD processes. Operate and maintain Kafka clusters for real-time data pipelines. Diagnose and resolve issues across systems, networks, containers, and applications. Use observability tools (Grafana, Prometheus, Kibana, Elasticsearch) to monitor system health. Automate system management tasks using Ansible. Participate in an on-call rotation to support global operations. Required Skills & Experience: Strong hands-on Linux (RHEL … managing Kubernetes clusters. Proficiency with GitLab for version control and CI/CD workflows. Solid understanding of Kafka in high-throughput environments. Experience with observability tools such as Grafana, Prometheus, Kibana, and Elasticsearch. Expertise in Ansible for automation and configuration management. Strong problem-solving skills across infrastructure layers (compute, network, OS, containers). More ❯
for CI/CD processes. Operate and maintain Kafka clusters for real-time data pipelines. Diagnose and resolve issues across systems, networks, containers, and applications. Use observability tools (Grafana, Prometheus, Kibana, Elasticsearch) to monitor system health. Automate system management tasks using Ansible. Participate in an on-call rotation to support global operations. Required Skills & Experience: Strong hands-on Linux (RHEL … managing Kubernetes clusters. Proficiency with GitLab for version control and CI/CD workflows. Solid understanding of Kafka in high-throughput environments. Experience with observability tools such as Grafana, Prometheus, Kibana, and Elasticsearch. Expertise in Ansible for automation and configuration management. Strong problem-solving skills across infrastructure layers (compute, network, OS, containers). More ❯
for CI/CD processes. Operate and maintain Kafka clusters for real-time data pipelines. Diagnose and resolve issues across systems, networks, containers, and applications. Use observability tools (Grafana, Prometheus, Kibana, Elasticsearch) to monitor system health. Automate system management tasks using Ansible. Participate in an on-call rotation to support global operations. Required Skills & Experience: Strong hands-on Linux (RHEL … managing Kubernetes clusters. Proficiency with GitLab for version control and CI/CD workflows. Solid understanding of Kafka in high-throughput environments. Experience with observability tools such as Grafana, Prometheus, Kibana, and Elasticsearch. Expertise in Ansible for automation and configuration management. Strong problem-solving skills across infrastructure layers (compute, network, OS, containers). More ❯
hands-on support to ensure system reliability and performance. London hybrid working - Contract Opportunity - London Hybrid Must have's Python scripting - They could take someone with Go Automation experience Prometheus/grafana/Prom QL CI/CD AWS Splunk Key Responsibilities Develop and maintain automation scripts, primarily in Python(Go experience also considered). Respond to and resolve incidents … perform problem analysis to maintain system uptime and reliability. Collaborate with internal teams and customers to troubleshoot and resolve infrastructure and application issues. Operate and enhance observability tooling, including Prometheus, Grafana, and Splunk, with a strong focus on PromQL. Participate in anon-call rotation to support critical production systems. Improve and maintain CI/CD pipelines and deployment processes. Work … Strong scripting skills in Python(Go, Bash, or SQL also beneficial). Proven experience with automation and infrastructure-as-code practices. Deep understanding of monitoring and observability, particularly with Prometheus, Grafana, and PromQL. Experience with CI/CD tools and modern deployment strategies. Solid hands-on experience with AWS services in a production environment. Proficiency with Splunk for log analysis More ❯
LLM Deployment & Fine Tuning: Drive the deployment and fine tuning of large language models (LLMs) while ensuring efficient training pipelines and model hosting. Monitoring & Performance Optimization: Implement monitoring (using Prometheus/Grafana and similar tools) and logging solutions to ensure system reliability and to optimise model throughput. Collaborate Across Teams: Work closely with Machine Learning engineers to enable their delivery … Operations: Hands-on experience with training pipelines, model hosting, and throughput optimisation. Expertise in deploying and fine tuning large language models. Monitoring & Performance: Proficiency with monitoring tools such as Prometheus and Grafana. Programming & Automation: Strong proficiency in Python, with experience in developing production applications. Data Engineering & Streaming: Familiarity with data streaming tools and Elastic to ensure high performance in data More ❯
orchestration. Support multi-tenancy and environment rationalization to reduce duplication and inefficiency. Define and implement observability standards, including logging, metrics, tracing, and alerting . Use tools like New Relic , Prometheus , and Grafana , alongside building custom instrumentation for key platform services. Drive incident readiness and operational resilience by enabling actionable monitoring and alerting. Drive cloud cost visibility and optimization efforts across … and operating developer platforms and enablement frameworks. Experience with cloud-native technologies, Kubernetes, and Infrastructure as Code (Terraform, Helm, etc.). Strong understanding of observability tooling (especially New Relic, Prometheus, Grafana) and incident response best practices. Familiarity with FinOps, platform cost tracking, and infrastructure efficiency techniques. Excellent communication, leadership, and stakeholder management skills. Attract, hire, and develop talented platform engineers More ❯
Kubernetes (both on-premises and cloud-native) Strong Linux (preferably RHEL) systems administration skills Proven experience with HPC workloads or scientific computing clusters Hands-on experience with observability tools: Prometheus, Grafana, Loki Infrastructure as Code (IaC) using Terraform, Ansible CI/CD experience with GitOps tools (e.g., ArgoCD, Flux) Prior experience leading engineering teams in distributed environments More ❯
Kubernetes (both on-premises and cloud-native) Strong Linux (preferably RHEL) systems administration skills Proven experience with HPC workloads or scientific computing clusters Hands-on experience with observability tools : Prometheus, Grafana, Loki Infrastructure as Code (IaC) using Terraform , Ansible CI/CD experience with GitOps tools (e.g., ArgoCD, Flux) Prior experience leading engineering teams in distributed environments More ❯
Kubernetes (both on-premises and cloud-native) Strong Linux (preferably RHEL) systems administration skills Proven experience with HPC workloads or scientific computing clusters Hands-on experience with observability tools : Prometheus, Grafana, Loki Infrastructure as Code (IaC) using Terraform , Ansible CI/CD experience with GitOps tools (e.g., ArgoCD, Flux) Prior experience leading engineering teams in distributed environments More ❯
frameworks Comfortable working in a hybrid team environment (3 days a week onsite in London) Experience with Terraform, Kubernetes, or CI/CD pipelines Familiarity with observability tooling (e.g. Prometheus, Grafana, Datadog) Experience mentoring or leading other engineers More ❯
frameworks Comfortable working in a hybrid team environment (3 days a week onsite in London) Experience with Terraform, Kubernetes, or CI/CD pipelines Familiarity with observability tooling (e.g. Prometheus, Grafana, Datadog) Experience mentoring or leading other engineers More ❯
Ansible Strong debugging, testing, and performance tuning skills Nice to Have: Experience with event-driven architecture and message queues (e.g., Pub/Sub, Kafka) Familiarity with observability tools (e.g., Prometheus, Grafana, Stackdriver) Understanding of security best practices in microservices and API development Experience working in Agile/Scrum environments More ❯
Ansible Strong debugging, testing, and performance tuning skills Nice to Have: Experience with event-driven architecture and message queues (e.g., Pub/Sub, Kafka) Familiarity with observability tools (e.g., Prometheus, Grafana, Stackdriver) Understanding of security best practices in microservices and API development Experience working in Agile/Scrum environments More ❯
Ansible Strong debugging, testing, and performance tuning skills Nice to Have: Experience with event-driven architecture and message queues (e.g., Pub/Sub, Kafka) Familiarity with observability tools (e.g., Prometheus, Grafana, Stackdriver) Understanding of security best practices in microservices and API development Experience working in Agile/Scrum environments More ❯
delivery The technical landscape: Azure (AKS, Functions, App Services, Event Grid, etc.) Infrastructure as Code (Terraform) CI/CD using Azure DevOps Monitoring and Observability (Application Insights, Azure Monitor, Prometheus/Grafana) GitHub for version control, and a modern SDLC with automated testing and security baked in What we’re looking for: Someone who can lead the design and delivery More ❯
Ability to write technical documentation Desirable Experience: Experience with ArgoCD, Istio/Service Mesh, Tekton and Helm Charts will be added advantage OpenShift monitoring and writing custom alerting using Prometheus Alertmanager CheckMK to monitor physical infrastructure Experience with Red Hat Quay Container Registry Experience with Red Hat CEPH Storage Experience with Red Hat OpenStack Experience with maintaining Dell PowerEdge Servers More ❯
nodes and IPoE 13. Proven ability to work independently & collaboratively in a fast-paced technical environment. 14. Demonstratable knowledge of the telecommunications industry & technologies. 15. Experience of working with Prometheus and Grafana More ❯
with Kubernetes, Docker, Helm Proficient in Terraform, CI/CD Pipelines (Drone/GitLab) Excellent understanding of Kafka internals, stream processing, and secure Kafka deployments Strong experience across monitoring (Prometheus, Grafana, CloudWatch) Knowledge of security hardening, IAM, WAF, Shield, Vault Working knowledge of Agile, Infrastructure-as-Code, and DevSecOps practices UK*C or Enhanced DV (eDV) Clearance is a must More ❯