You'll be responsible for ensuring world-class production environment reliability while implementing sophisticated monitoring solutions through their technology stack, including Splunk, Telegraf/Prometheus, Grafana, and PagerDuty. Role Impact: You'll drive excellence across production and non-production environments, optimizing trading data management, service delivery, and server operations. Your … operations experience in financial technology or similar industry Strong AWS cloud architecture expertise and advanced Linux systems administration Demonstrated success with monitoring solutions (Grafana, Prometheus) Experience optimizing build and release processes in trading environments Networking and troubleshooting capabilities Advanced Python and Bash scripting for automation Extensive experience with Docker and More ❯
Operate and maintain Kafka clusters for real-time data pipelines. Diagnose and resolve issues across systems, networks, containers, and applications. Use observability tools (Grafana, Prometheus, Kibana, Elasticsearch) to monitor system health. Automate system management tasks using Ansible. Participate in an on-call rotation to support global operations. Required Skills & Experience … GitLab for version control and CI/CD workflows. Solid understanding of Kafka in high-throughput environments. Experience with observability tools such as Grafana, Prometheus, Kibana, and Elasticsearch. Expertise in Ansible for automation and configuration management. Strong problem-solving skills across infrastructure layers (compute, network, OS, containers). More ❯
Operate and maintain Kafka clusters for real-time data pipelines. Diagnose and resolve issues across systems, networks, containers, and applications. Use observability tools (Grafana, Prometheus, Kibana, Elasticsearch) to monitor system health. Automate system management tasks using Ansible. Participate in an on-call rotation to support global operations. Required Skills & Experience … GitLab for version control and CI/CD workflows. Solid understanding of Kafka in high-throughput environments. Experience with observability tools such as Grafana, Prometheus, Kibana, and Elasticsearch. Expertise in Ansible for automation and configuration management. Strong problem-solving skills across infrastructure layers (compute, network, OS, containers). More ❯
leeds, west yorkshire, yorkshire and the humber, United Kingdom Hybrid / WFH Options
Fruition Group
supporting scalable platform infrastructure with tools like Docker, Kubernetes, cloud platforms (AWS, Azure, or GCP background is welcome), Infrastructure as Code (Terraform, Pulumi, etc.), Prometheus, and Grafana . Key Skills & Responsibilities: Drive the technical vision and architectural direction within the team Design, implement, and maintain robust CI/CD pipelines … Lead on the use of Infrastructure as Code for environment provisioning and configuration Champion observability best practices using Prometheus and Grafana Collaborate across multiple internal teams and stakeholders Foster a culture of autonomy, innovation, and continuous improvement Lead by example with a hands-on approach and clear technical guidance Salary More ❯
At Cloud Bridge , we transform how businesses use AWS cloud services. We specialise in Consultancy, Managed Services, Cloud Governance, FinOps, and AI/ML to unlock AWS's full potential. Recognised as AWS's Rising Star Partner of the Year More ❯
At Cloud Bridge , we transform how businesses use AWS cloud services. We specialise in Consultancy, Managed Services, Cloud Governance, FinOps, and AI/ML to unlock AWS's full potential. Recognised as AWS's Rising Star Partner of the Year More ❯
At Cloud Bridge , we transform how businesses use AWS cloud services. We specialise in Consultancy, Managed Services, Cloud Governance, FinOps, and AI/ML to unlock AWS's full potential. Recognised as AWS's Rising Star Partner of the Year More ❯
production and non-production environments. You will work across real-time incidents and projects, including capacity planning, WAN, and system observability using tools like Prometheus and Grafana. Requirements: Strong experience administering Solace PubSub+ messaging across environments (on-prem and Cloud) Strong knowledge of production support Configure and optimise Solace across … WAN environments, networking and latency Strong knowledge of tools such as Grafana and Prometheus Understanding of DevOps tooling and CI/CD pipelines desirable Proficiency in troubleshooting message delivery, persistence, and topic routing etc Good Linux/Unix knowledge and scripting (Bash, Python) Excellent communication and interpersonal skills skills If More ❯
production and non-production environments. You will work across real-time incidents and projects, including capacity planning, WAN, and system observability using tools like Prometheus and Grafana. Requirements: Strong experience administering Solace PubSub+ messaging across environments (on-prem and Cloud) Strong knowledge of production support Configure and optimise Solace across … WAN environments, networking and latency Strong knowledge of tools such as Grafana and Prometheus Understanding of DevOps tooling and CI/CD pipelines desirable Proficiency in troubleshooting message delivery, persistence, and topic routing etc Good Linux/Unix knowledge and scripting (Bash, Python) Excellent communication and interpersonal skills skills If More ❯
ideally GCP or AWS. Deployment of cloud resources using Infrastructure-as-code such as Terraform. Any experience with dashboards and alerting with Grafana using Prometheus metrics, Loki logging, and Tempo tracing to monitor and debug services would be an advantage. Strong communication skills and team player who can discuss complex … by working with other engineering, product, and support teams. Set up and monitor GCP resources using Infrastructure as code with Terraform. Working with Grafana, Prometheus and Loki to create dashboards and alerting. Be proactive in identifying and making improvements to our existing code base. Actively participate in resolving incidents that More ❯
to optimize data retrieval, caching, and indexing for fast responses. Design fault-tolerant and resilient distributed systems using Kubernetes and cloud-native technologies. Utilize Prometheus, Grafana, and Kibana for monitoring and observability of backend systems. Optimize API performance and response times for a seamless user experience. Data Analytics & User Insights … driven architectures. Deep understanding of data processing, analytics, and real-time event streaming. Expertise in PostgreSQL, AWS and Kubernetes. Proficiency in monitoring tools like Prometheus, Grafana, and Kibana. Knowledge of security best practices, including OAuth, JWT, and data encryption. Fluent in English with strong communication and collaboration skills. Preferred Qualifications More ❯
london, south east england, United Kingdom Hybrid / WFH Options
eTeam
to optimize data retrieval, caching, and indexing for fast responses. Design fault-tolerant and resilient distributed systems using Kubernetes and cloud-native technologies. Utilize Prometheus, Grafana, and Kibana for monitoring and observability of backend systems. Optimize API performance and response times for a seamless user experience. Data Analytics & User Insights … driven architectures. Deep understanding of data processing, analytics, and real-time event streaming. Expertise in PostgreSQL, AWS and Kubernetes. Proficiency in monitoring tools like Prometheus, Grafana, and Kibana. Knowledge of security best practices, including OAuth, JWT, and data encryption. Fluent in English with strong communication and collaboration skills. Preferred Qualifications More ❯
environment rationalization to reduce duplication and inefficiency. Define and implement observability standards, including logging, metrics, tracing, and alerting . Use tools like New Relic , Prometheus , and Grafana , alongside building custom instrumentation for key platform services. Drive incident readiness and operational resilience by enabling actionable monitoring and alerting. Drive cloud cost … enablement frameworks. Experience with cloud-native technologies, Kubernetes, and Infrastructure as Code (Terraform, Helm, etc.). Strong understanding of observability tooling (especially New Relic, Prometheus, Grafana) and incident response best practices. Familiarity with FinOps, platform cost tracking, and infrastructure efficiency techniques. Excellent communication, leadership, and stakeholder management skills. Attract, hire More ❯
Uxbridge, Middlesex, United Kingdom Hybrid / WFH Options
Avature
You'll join a new team creating a greenfield product that will fundamentally change how giffgaff and our members (that's what we call our lovely customers) interact with each other and in the telecommunications industry-a brand new area More ❯
in London. As well as the day to day, you’ll have the opportunity to work on expanding monitoring with projects across Grafana and Prometheus, data centre and hardware projects and even networking and latency. This is the perfect opportunity for an experienced engineer to take the next step in … a 24/7 enterprise environment. Experience working with distributed systems over WAN. A good understanding of networking, latency, and failover strategy. Experience of Prometheus and Grafana for monitoring. Experience with capacity management, performance tuning, and system scaling. Familiarity with Linux and scripting (Bash, Python, etc.) Any knowledge of DevOps More ❯
in London. As well as the day to day, you’ll have the opportunity to work on expanding monitoring with projects across Grafana and Prometheus, data centre and hardware projects and even networking and latency. This is the perfect opportunity for an experienced Solace engineer to take the next step … a 24/7 enterprise environment. Experience working with distributed systems over WAN. A good understanding of networking, latency, and failover strategy. Experience of Prometheus and Grafana for monitoring. Experience with capacity management, performance tuning, and system scaling. Familiarity with Linux and scripting (Bash, Python, etc.) Any knowledge of DevOps More ❯
london, south east england, United Kingdom Hybrid / WFH Options
Oliver Bernard
in London. As well as the day to day, you’ll have the opportunity to work on expanding monitoring with projects across Grafana and Prometheus, data centre and hardware projects and even networking and latency. This is the perfect opportunity for an experienced Solace engineer to take the next step … a 24/7 enterprise environment. Experience working with distributed systems over WAN. A good understanding of networking, latency, and failover strategy. Experience of Prometheus and Grafana for monitoring. Experience with capacity management, performance tuning, and system scaling. Familiarity with Linux and scripting (Bash, Python, etc.) Any knowledge of DevOps More ❯
london, south east england, United Kingdom Hybrid / WFH Options
Oliver Bernard
in London. As well as the day to day, you’ll have the opportunity to work on expanding monitoring with projects across Grafana and Prometheus, data centre and hardware projects and even networking and latency. This is the perfect opportunity for an experienced engineer to take the next step in … a 24/7 enterprise environment. Experience working with distributed systems over WAN. A good understanding of networking, latency, and failover strategy. Experience of Prometheus and Grafana for monitoring. Experience with capacity management, performance tuning, and system scaling. Familiarity with Linux and scripting (Bash, Python, etc.) Any knowledge of DevOps More ❯
london, south east england, united kingdom Hybrid / WFH Options
Oliver Bernard
in London. As well as the day to day, you’ll have the opportunity to work on expanding monitoring with projects across Grafana and Prometheus, data centre and hardware projects and even networking and latency. This is the perfect opportunity for an experienced Solace engineer to take the next step … a 24/7 enterprise environment. Experience working with distributed systems over WAN. A good understanding of networking, latency, and failover strategy. Experience of Prometheus and Grafana for monitoring. Experience with capacity management, performance tuning, and system scaling. Familiarity with Linux and scripting (Bash, Python, etc.) Any knowledge of DevOps More ❯
remediation across infrastructure Troubleshoot and resolve complex support issues (L2/L3) Assist in writing operational procedures alongside the Solutions team Monitor systems using Prometheus, Grafana, Alert Manager Manage Active Directory, firewall/switch configuration, and hardware assets Core Tech Stack - You do not need to have experience in all … Windows & Linux admin, AD, GPO, DFS Cloud: AWS (EC2, RDS, S3, IAM, VPC, CloudWatch, etc.) Automation: Terraform, Vagrant, Ansible, Shell, Python, PowerShell Monitoring: Grafana, Prometheus, Node Exporter Security & Tools: Juniper Firewalls, Nexus scanning, Git, Jira, ServiceNow This is an exciting opportunity to be part of a fast-moving team delivering More ❯
and monitoring tools Triaging production issues Nice to have Not vital, but you'll have the edge if you also have experience with: Kotlin Prometheus Query Language (PromQL) Grafana Prometheus What you bring Agile: Test-Driven Development, collaboration and continuous delivery are your preferred engineering practices? We take the best More ❯
issues Performance tuning of JVM apps Nice to have Not vital, but you'll have the edge if you also have experience with: Kotlin Prometheus Query Language (PromQL) Grafana Prometheus or have worked in: an eCommerce organisation a shipping/logistics/exports organisation What you bring Agile: Test-Driven More ❯
and collaborating with dev and support teams. Key Responsibilities: Administer Solace PubSub+ brokers across environments Provide production support and incident resolution Monitor systems using Prometheus and Grafana Required Skills: 3+ years with Solace in workplace environments Strong production support and troubleshooting skills Experience with WAN messaging, Prometheus, Grafana Scripting knowledge More ❯
and collaborating with dev and support teams. Key Responsibilities: Administer Solace PubSub+ brokers across environments Provide production support and incident resolution Monitor systems using Prometheus and Grafana Required Skills: 3+ years with Solace in workplace environments Strong production support and troubleshooting skills Experience with WAN messaging, Prometheus, Grafana Scripting knowledge More ❯