and ClickHouse-schema design, indexing, and caching for sub-second reads. Experience deploying microservices in production using Docker and Kubernetes. Skilled in setting up observability and alerting pipelines (Prometheus, Grafana), including model drift detection. Experience with real-time ML inference and model serving frameworks (e.g., TorchServe, Triton, BentoML) for low-latency applications. Experience designing feedback loops, active learning, or user More ❯
variety of CI/CD tools and technologies (e.g., Git, Gitlab, Jenkins, GCP, AWS) Knowledge of containerisation and microservice architecture Ability to develop dashboard UIs for publishing performance (e.g., Grafana, Apache Superset, etc.) Exposure to safety certification standards and processes We provide: Competitive salary, benchmarked against the market and reviewed annually Company share programme Hybrid and/or flexible work More ❯
InfluxDB, and ClickHouseschema design, indexing, and caching for sub-second reads. Experience deploying microservices in production using Docker and Kubernetes. Skilled in setting up observability and alerting pipelines (Prometheus, Grafana), including model drift detection. Experience with real-time ML inference and model serving frameworks (e.g., TorchServe, Triton, BentoML) for low-latency applications. Experience designing feedback loops, active learning, or user More ❯
Support Kubernetes/OpenShift environments and application deployments Enable developers through onboarding and technical support Maintain and improve CI/CD pipelines (Tekton, Argo CD) Monitor systems using Prometheus, Grafana, Splunk, Loki, and EFK Automate infrastructure provisioning using scripting and IaC tools Collaborate with vendors and internal teams for issue resolution What You'll Bring Strong Linux (Red Hat) and More ❯
platforms , including Google Cloud Platform (GCP) , AWS , and Azure Strong understanding of networking technologies , such as LAN, WAN, firewalls , and related infrastructure Proficient with observability and monitoring tools , e.g Grafana, SolarWinds, Prometheus, AWS CloudWatch, Splunk Familiarity with DevOps practices , including CI/CD pipelines , is beneficial If you would be interested in having a further chat then please send your More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
INTEC SELECT LIMITED
Conduct architecture reviews, technical audits, and drive adoption of best practices Partner with infrastructure teams to ensure system reliability and operational efficiency Integrate monitoring and logging solutions (e.g., Prometheus, Grafana, ELK) Define strategies for disaster recovery, scaling, and infrastructure resilience Improve observability by enhancing visibility into performance and error metrics Skills and Experience Required 10+ years of backend development experience More ❯
Desktop environments Proficiency in scripting with Bash, PowerShell, and Ansible; Python experience is a plus Familiarity with virtualisation platforms, containerisation, and orchestration tools Experience with monitoring stacks (e.g., Prometheus, Grafana, ELK/EFK) Ability to troubleshoot complex issues using a structured, methodical approach Excellent written and visual communication skills; able to produce clear documentation and diagrams Highly organised and capable More ❯
with at least 3+ years experience with Java. 5+ years in data engineering and data pipeline development in high-volume production environments. 5+ years experience with monitoring systems (Prometheus, Grafana, Zabbix, Datadog). Experience working in fintech, crypto, or trading industries; familiarity with FIX is a plus. Experience in object-oriented development with strong software engineering foundations. Experience with data More ❯
of technical experience in Cloud DevOps, SaaS, or observability, with 5+ years in leadership roles. Strong hands-on experience with AWS, GCP, Azure, K8S, Terraform and observability tools: Prometheus, Grafana, OpenTelemetry, ELK, Splunk, Datadog, and similar. Proficiency with metrics, logs, traces and APM. Leadership & Global Operations Proven success leading multi-regional or global technical teams with direct management of managers. More ❯
Python, Go, or similar languages for automation and scripting. Expert-level knowledge of AWS Networking, TLS, and security best practices. Experience with container orchestration (Kubernetes, EKS) and observability tools (Grafana, ELK). A passion for innovation, problem-solving, and delivering high-impact solutions. Experience leading/managing junior engineers Significant experience with Control Tower and deploying landing zones. For this More ❯
and scripting languages such as Python, Go, or Bash. Experience with Kubernetes security, including workload isolation, RBAC, and network policies, containerisation, orchestration, and Kubernetes observability tools (e.g., Falco, Prometheus, Grafana). Experience with infrastructure-as-code and configuration management tools (e.g., Terraform, Helm, ArgoCD). Eligibility to obtain UK Developed Vetting (DV) security clearance; British Citizenship is required for this More ❯
experience. Working knowledge of HPC container runtimes (e.g., Singularity, Apptainer). Exposure to provisioning and automation tools (e.g., Ansible, PXE, Terraform). Experience with monitoring tools such as Prometheus, Grafana, and DCGM. Understanding of GPU/accelerator toolchains like CUDA or ROCm. A proactive, customer-first mindset with strong communication skills. Ability to work effectively in both individual and team More ❯
various methods such as unit, integration, contract and E2E testing. You have a high degree of experience in observing the performance and health of applications via tools such as Grafana, Prometheus, Data Dog, Sentry, etc. You have a strong desire and are an advocate for performant applications. Proactive in solving problems simply and effectively, with an eye for pragmatic solutions. More ❯
Technologies (Kubernetes, Open Shift) Messaging Technologies (Kafka, Solace, TIBCO) Database/Data Store/Data Query Technologies (SQL Server, Trino, Mongo, S3) Observability Technologies (OpenTelemetry, Elastic Stack/ELK, Grafana) This job description provides a high-level review of the types of work performed. Other job-related duties may be assigned as required. Job Family Group: Technology Job Family: Applications More ❯
various methods such as unit, integration, contract and E2E testing. You have a high degree of experience in observing the performance and health of applications via tools such as Grafana, Prometheus, Data Dog, Sentry, etc. You have a strong desire and are an advocate for performant applications. Proactive in solving problems simply and effectively, with an eye for pragmatic solutions. More ❯
for someone with deep expertise in: oInfrastructure as Code: Terraform, CloudFormation o Security best practices: IAM, KMS, encryption in transit/at rest, DevSecOps o Monitoring & observability: Datadog, Prometheus, Grafana, ELK, or similar What You Bring o 6+ years in DevOps or platform engineering, with experience in a technical lead role. o Proven experience designing and operating cloud-native platforms More ❯
Hounslow, London, United Kingdom Hybrid / WFH Options
Sky UK
be skilled in C/C++, Python, and Linux . Ideally you'll also have experience with log management and analysis tools such as Elastic Stack (ELK), Splunk, and Grafana for data visualisation and monitoring. Proven expertise in at least one scripting language, such as Bash, Python, or Go. Ability to make good technical decisions and convince others about the More ❯
distributed systems, and the challenges of running high-performance API gateways. Familiarity with GraphQL Federation is a significant plus. Experience building or managing modern observability stacks (e.g., OpenTelemetry, Prometheus, Grafana, ClickHouse). A self-starter attitude and a leader's mindset: you are comfortable with ambiguity, can identify and solve ill-defined problems, and don't need hand-holding. Excellent More ❯
pull - you know how to fix and finesse) Cloud deployment experience (DigitalOcean, AWS, GCP) Infrastructure-as-code (Terraform) DevOps & tooling (GitHub workflows and CI/CD pipelines) Monitoring tools: Grafana, Prometheus or Loki Front-end Angular experience ideal; React or Vue also welcome with willingness to learn Angular Good understanding of component-based UI architecture Bonus: UI design sensitivity or More ❯
primary language for our backend codebase AWS & GCP - we're cloud-native Kubernetes (EKS) Microservice based architecture RESTful APIs PostgreSQL, JDBI, Flyway TeamCity for CI/CD Terraform and Grafana The Team: The Core Banking group is seeking passionate engineers ready to tackle complex challenges and contribute to foundational systems, powering modern banking, that process millions of transactions daily, ensuring More ❯
weeks are ever the same. Essential Skills Solid Unix/Linux skills Experience with Bash, SQL, PHP Comfortable with Apache/Nginx, load balancers (HAProxy), and monitoring tools (Nagios, Grafana, Prometheus) Knowledge of log management (Graylog, Elasticsearch) Familiar with Ansible and Gitlab CI/CD Experience using Git/SVN What Sets You Apart Passionate self-starter who loves problem More ❯
Manchester, Lancashire, England, United Kingdom Hybrid / WFH Options
DCS Recruitment
weeks are ever the same. Essential Skills Solid Unix/Linux skills Experience with Bash, SQL, PHP Comfortable with Apache/Nginx, load balancers (HAProxy), and monitoring tools (Nagios, Grafana, Prometheus) Knowledge of log management (Graylog, Elasticsearch) Familiar with Ansible and Gitlab CI/CD Experience using Git/SVN What Sets You Apart Passionate self-starter who loves problem More ❯
Chester, Cheshire, England, United Kingdom Hybrid / WFH Options
Robert Walters
APIs , CI/CD pipelines , and test-driven development using tools like Jest, Cypress, Playwright, or Pact. Proficiency with HTML5, CSS3, Redux, Docker, GitHub , and monitoring tools such as Grafana, Dynatrace or ELK . Experience managing and mentoring software engineers in Agile teams. A passion for engineering quality, scalability, and security. Bonus Points If You Have Experience building containerised applications More ❯
or all of the following: configuration management, orchestration, CI/CD, infrastructure monitoring and telemetry Experience using Agile (e.g. Kanban or Scrum) Familiarity with telemetry tools such as Splunk, Grafana Experience with Web frameworks (BENTO, REACT, Angular, DJANGO) Bloomberg is an equal opportunity employer and we value diversity at our company. We do not discriminate on the basis of age More ❯
multi-account AWS setups. Extensive experience with AWS Organisations Expert-level knowledge of AWS Networking, TLS, and security best practices. Experience with container orchestration (Kubernetes, EKS) and observability tools (Grafana, ELK). A passion for innovation, problem-solving, and delivering high-impact solutions. Working with Control Tower and Landing Zones Why Work For Us? Competitive base salary up to More ❯