East Sussex, England, United Kingdom Hybrid / WFH Options
Areti Group | B Corp™
pressure. Desirable Skills Experience with cloud platforms (Microsoft Azure or AWS). Familiarity with configuration management tools (Ansible, Puppet, SCCM). Experience in monitoring and alerting systems (e.g., Zabbix, Prometheus, SolarWinds). Scripting experience (Bash, PowerShell, or Python). Knowledge of ITIL processes and service management best practices. Previous experience in utilities, energy, or critical infrastructure sectors. More ❯
NeuralMesh distributed AI storage for high-speed data access and resilience • Implementing CI/CD and MLOps pipelines using Argo Workflows, Jenkins and GitHub • Monitoring platform performance using Zabbix, Prometheus and Grafana • Integrating SAN and Infiniband networking to achieve high throughput and reliability • Creating detailed documentation and performing knowledge transfer to operations teams • Providing ongoing platform support, patching, troubleshooting and More ❯
NeuralMesh distributed AI storage for high-speed data access and resilience • Implementing CI/CD and MLOps pipelines using Argo Workflows, Jenkins and GitHub • Monitoring platform performance using Zabbix, Prometheus and Grafana • Integrating SAN and Infiniband networking to achieve high throughput and reliability • Creating detailed documentation and performing knowledge transfer to operations teams • Providing ongoing platform support, patching, troubleshooting and More ❯
in AI, ML, Computer Science, or a related field. Understanding of Reinforcement Learning algorithms. Experience with cloud services (AWS, Azure, GCP). Familiarity with tools such as Kafka, Kubernetes, Prometheus, and Grafana. Interest or prior experience in financial markets. Why Join Us At Predictiva, you’ll have the chance to work at the intersection of AI research, financial innovation, and More ❯
Services SQL Server (Including T-SQL) Angular (with Typescript) RabbitMQ/Kafka Various Azure Features (App Services, VMs, Config etc...) Git Snowflake Nuget (Producing and Consuming)Azure DevOps (CI) Prometheus & Grafana (Monitoring & Alerting) ELK Stack/Azure Log Analytics (Logging) We are also in the middle of a transformational migration to Azure. This role sits in the IT Development team More ❯
Services SQL Server (Including T-SQL) Angular (with Typescript) RabbitMQ/Kafka Various Azure Features (App Services, VMs, Config etc...) Git Snowflake Nuget (Producing and Consuming)Azure DevOps (CI) Prometheus & Grafana (Monitoring & Alerting) ELK Stack/Azure Log Analytics (Logging) We are also in the middle of a transformational migration to Azure. This role sits in the IT Development team More ❯
ensuring high availability, optimal performance, and reliability across production and non-production environments. This includes working on incident response, capacity planning, WAN optimization, and system observability using tools like Prometheus and Grafana . Key Responsibilities: Administer and maintain Solace PubSub+ appliances and software brokers across environments (on-prem and cloud). Provide production support for messaging-related incidents, including root … cause analysis and resolution. Monitor system performance and health using Prometheus and Grafana ; proactively identify and address anomalies. Configure and optimize Solace across WAN environments , ensuring low-latency, secure, and reliable messaging. Collaborate with development and application support teams to troubleshoot message flow issues and integration problems. Perform capacity planning , scaling, and tuning of Solace infrastructure to meet current and … background in production support , preferably in a 24x7 enterprise environment. Experience working with distributed systems over WAN , with an understanding of networking, latency, and failover strategies. Solid experience with Prometheus and Grafana for system monitoring and alerting. Proficiency in troubleshooting message delivery, persistence, and topic routing. Experience with capacity management , performance tuning, and system scaling. Familiarity with Linux/Unix More ❯
ensuring high availability, optimal performance, and reliability across production and non-production environments. This includes working on incident response, capacity planning, WAN optimization, and system observability using tools like Prometheus and Grafana . Key Responsibilities: Administer and maintain Solace PubSub+ appliances and software brokers across environments (on-prem and cloud). Provide production support for messaging-related incidents, including root … cause analysis and resolution. Monitor system performance and health using Prometheus and Grafana ; proactively identify and address anomalies. Configure and optimize Solace across WAN environments , ensuring low-latency, secure, and reliable messaging. Collaborate with development and application support teams to troubleshoot message flow issues and integration problems. Perform capacity planning , scaling, and tuning of Solace infrastructure to meet current and … background in production support , preferably in a 24x7 enterprise environment. Experience working with distributed systems over WAN , with an understanding of networking, latency, and failover strategies. Solid experience with Prometheus and Grafana for system monitoring and alerting. Proficiency in troubleshooting message delivery, persistence, and topic routing. Experience with capacity management , performance tuning, and system scaling. Familiarity with Linux/Unix More ❯
and platform engineering. Tech Stack Cloud: AWS (EC2, RDS, S3, IAM, CloudWatch, Lambda) Infrastructure as Code: Terraform Containerisation & Orchestration: Docker, Kubernetes (EKS), Helm Configuration Management: Ansible Monitoring & Observability: Grafana, Prometheus CI/CD: GitHub Actions Automation & Scripting: Python, Bash, Go or Java What We’re Looking For Proven experience running AWS cloud infrastructure in a production or regulated (financial) environment. … Hands-on experience managing Kubernetes clusters (preferably EKS). Strong understanding of Infrastructure as Code using Terraform. Familiarity with monitoring and observability stacks such as Prometheus and Grafana. Experience building and maintaining CI/CD pipelines (GitHub Actions or similar). Strong scripting or automation skills using Python, Bash, Go or Java . A collaborative mindset — comfortable working alongside developers More ❯
Build and maintain Infrastructure as Code (IaC) using Terraform and Ansible. Design highly reliable, scalable, and secure infrastructure supporting performance-critical workloads. Build proactive monitoring, observability, and alerting with Prometheus, Grafana, Azure Monitor, DataDog, and Dynatrace. Troubleshoot complex system issues spanning applications, networks, and infrastructure. Define platform SLAs, SLOs, and governance standards for self-service use. Collaborate closely with Salesforce … and Ansible, along with scripting in PowerShell, Python, or Bash Experience implementing GitOps workflows and managing platform SLAs, SLOs, and governance standards Familiarity with observability and monitoring tools including Prometheus, Grafana, Azure Monitor, DataDog, or Dynatrace Preferred experience supporting Salesforce DevOps pipelines and working with Java, .NET, or Node.js application environments Exposure to AI/ML platforms, real-time data More ❯
React on the Frontend. Tech & Data Science stack: Kubernetes & Docker on Google Cloud Python 3: Pandas, RabbitMQ, Celery, Flask, SciPy, NumPy, Dash, Plotly, Matplotlib Javascript, React, Redux PostgreSQL, Redis Prometheus, Alert Manager, DataDog If you joined the company in a Data Science role you would be working on sophisticated pricing algorithms which would enable companies in the entertainment industry to More ❯
City of London, London, United Kingdom Hybrid / WFH Options
ARC IT Recruitment Ltd
and contribution to a follow-the-sun model. Key Requirements: TechOps/Production Engineering/SRE experience supportingequitiesplatforms. Tooling exposure: Kubernetes/containers, CI/CD, Terraform, Datadog/Prometheus/Splunk/Geneos. Practical understanding of market microstructure, exchange connectivity, and TCA/controls. Composed, commercially aware communicator with traders and senior leadership. Package & set-up Competitive base + More ❯
PostgreSQL, sharded MySQL). You have software engineering experience. Strong backend fundamentals around concurrency, caching, indexing and distributed systems trade-offs. Proven track record of setting SLOs, building dashboards (Prometheus/Grafana, OpenTelemetry, etc.) and tuning alerts. Comfort with Kubernetes, IaC and cloud-native patterns; can debug from network to application layer. Self-starter with a maker mindset. We're More ❯
a big plus. Capable of writing clean, maintainable and well-tested code. Comfortable working in on-prem and cloud-native environments with an interest in observability, using tools like Prometheus and Grafana to keep services healthy and maintainable. Familiarity with AWS services and how to integrate them into modern applications. A keen focus on quality and security, combining testing and More ❯
City of London, London, United Kingdom Hybrid / WFH Options
M-XR
models (MongoDB, PostgreSQL) Implement asset storage, retrieval, and management systems (AWS S3) Build job queue management for async ML workflows (SNS, SQS) Setup application monitoring and logging (CloudWatch, Grafana, Prometheus) Implement CI/CD for application deployment (Bitbucket Pipelines) Create API documentation and developer tools What we are looking for 5+ years backend development experience with production applications Track record More ❯
models (MongoDB, PostgreSQL) Implement asset storage, retrieval, and management systems (AWS S3) Build job queue management for async ML workflows (SNS, SQS) Setup application monitoring and logging (CloudWatch, Grafana, Prometheus) Implement CI/CD for application deployment (Bitbucket Pipelines) Create API documentation and developer tools What we are looking for 5+ years backend development experience with production applications Track record More ❯
and deployment of these services all the way to production in a controlled and secure way. Tech stack - Java engineer needs experience with spring boot framework, TDD, Grafana and Prometheus for monitoring and alerting and understanding of the CI/CD process.All candidates must pass a BPSS.Immediate start.End March 2026.Weekly travel to Leeds/Newcastle/Manchester.£400 - £500 per More ❯
and deployment of these services all the way to production in a controlled and secure way. Tech stack - Java engineer needs experience with spring boot framework, TDD, Grafana and Prometheus for monitoring and alerting and understanding of the CI/CD process. All candidates must pass a BPSS. Immediate start. End March 2026. Weekly travel to Leeds/Newcastle/ More ❯
their aggressive growth plans, they are looking for a pragmatic and commercially oriented SRE to design, implement and maintain scalable and reliable systems. Tech Stack: Python/C++, Terraform, Prometheus, Kubernetes, Cloud Computing The core function of the role is to monitor and maintain uptime for trading systems, pricing engines and risk management tools. The client can offer market leading More ❯
their aggressive growth plans, they are looking for a pragmatic and commercially oriented SRE to design, implement and maintain scalable and reliable systems. Tech Stack: Python/C++, Terraform, Prometheus, Kubernetes, Cloud Computing The core function of the role is to monitor and maintain uptime for trading systems, pricing engines and risk management tools. The client can offer market leading More ❯
rally teams around a plan. A strong preference for user experience and comfort with technical details. Technical experience with containerised platforms using Kubernetes, databases, and observability tools such as Prometheus and OpenTelemetry. This is a chance to shape the future of observability and security, build products people count on, and do it all with curiosity and creativity. More ❯
Manchester, England, United Kingdom Hybrid / WFH Options
Lorien
modern technologies. with clear progression routes available. Key Requirements: Strong troubleshooting and fault-resolution experience across infrastructure and applications Hands-on experience with monitoring tools such as Instana, Splunk, Prometheus, Grafana, or SolarWinds Confident supporting both Windows and Linux operating systems Experience working in ITIL-aligned support environments Understanding of web hosting technologies (DNS, HTTP/S, SSL Certs, and More ❯
Manchester, Lancashire, England, United Kingdom Hybrid / WFH Options
Lorien
modern technologies. with clear progression routes available. Key Requirements: Strong troubleshooting and fault-resolution experience across infrastructure and applications Hands-on experience with monitoring tools such as Instana, Splunk, Prometheus, Grafana, or SolarWinds Confident supporting both Windows and Linux operating systems Experience working in ITIL-aligned support environments Understanding of web hosting technologies (DNS, HTTP/S, SSL Certs, and More ❯