years of experience with containerization and orchestration (Docker + Kubernetes) and confidence operating cloud infrastructures Front-end development experience a plus DevOps skills, especially leveraging open source tools (Kibana, Prometheus, Grafana) a plus Sound understanding of agile software development best practices including CI/CD, testing, monitoring, alerting and documentation Being Cloud agnostic means not being able to use any … managed Kubertnes service, so therefore build own Kubernete - experience with only managed Kubernetes would not be applicable for the role Kubernetes experience on at least one cloud Prometheus stack (Grafana, Prometheus, alertmanager Kubernetes upgrade and maintenance experience Any logging infrastructure experience Terraform Ansible Shell/Python Scripting Gitlab pipelines (or any other CI/CD) Desirable experience: Kubernetes security Kubernetes More ❯
Site Reliability Engineer, you will be responsible for designing, developing, and maintaining systems and applications using Golang. You will monitor and optimise system performance with tools such as Grafana, Prometheus, New Relic, and Splunk. Your role will involve identifying and resolving reliability issues, automating processes, and ensuring the seamless operation of the platform. If you have a passion for technology More ❯
Excellent understanding of incident/ticket lifecycle, SLA management, and escalation protocols. Demonstrated ability to lead, develop, and retain engineering talent. Experience with monitoring platforms (e.g. Nagios, Zabbix, SolarWinds, Prometheus) and ticketing tools. Excellent communication, time management, and decision-making skills. Desirable Background in the fintech or high-frequency trading sector. ITIL or other service management certifications. Experience with ISO27001 More ❯
using tools such as Terraform or CloudFormation Monitor and troubleshoot production systems to identify and resolve issues proactively Develop and maintain monitoring and logging systems using tools such as Prometheus and Grafana Implement and maintain security and compliance policies across all systems and environments Research and evaluate new tools and technologies to improve DevOps processes Collaborate with development teams to … Docker Experience with IaC tools such as Terraform and CloudFormation Knowledge of AWS cloud platform (e.g., EKS) Strong Linux systems administration skills Familiarity with monitoring and logging tools like Prometheus and Grafana Experience with scripting languages such as Python, Ruby, or Bash Excellent communication and collaboration skills Working with us is about: Joining a motivated and professional team Working in More ❯
make a move? Get in touch and apply today! Responsibilities: Respond rapidly to critical AWS incidents, identify root causes, and deploy automated hotfixes. Lead the setup and integration of Prometheus-Grafana observability stack. Refactor and modernize deployment pipelines using GitHub Actions and Kubernetes. Maintain robust monitoring, alerting, and CI/CD systems. Skills/Must have: Strong hands-on experience … with AWS (eg EC2, EKS, CloudWatch, Lambda). Background in incident, change, and problem management; comfortable with on-call rotations. Expertise in Prometheus, Grafana, and Splunk; solid knowledge of PromQL. Proficient in Scripting/programming (Python, Go, Bash, SQL). Salary: £500 per day More ❯
highly available systems within a technologically diverse stack used for global research and trading of FICCO and Cryptoassets. Leveraging technologies such as Terraform, Docker, Kubernetes, CI/CD, Python, Prometheus and Grafana, you will develop repeatable and supportable infrastructure to meet the demanding needs of our business. What you'll do in this role: Collaborate closely with the US Platform … Skills, Experience & Abilities: Proven experience in supporting mission critical, high performance trading infrastructure across various technology stacks. Experience deploying and supporting applications in Kubernetes Previous infrastructure monitoring experience using Prometheus and Grafana Previous experience maintaining and optimizing cloud infrastructure in AWS environments Experience performing database and database infrastructure support for highly available systems Working knowledge of TLS Demonstrated knowledge of More ❯
operating infrastructure on AWS and other providers Operating MongoDB (or other document database) clusters Operating Redis (or other key-value storage) clusters Administering Linux servers Maintaining distributed software Operating Prometheus and Grafana Operating logging collection and analysis systems Participating in the on-call rotation(4:00am - 16:00pm UTC) Skills: Kubernetes & containers (advanced) AWS/EKS (advanced) Linux (advanced) Terraform … and IaC in general (proficient) Helm (proficient) Go and/or Python (familiar) MongoDB (or similar) Redis (or similar) Monitoring - prometheus, grafana, thanos (familiar) Grasp of networking concepts (subnets, routing, peering, load balancing, NAT, etc.) Common networking protocols (DNS, TCP/IP, HTTP, TLS, UDP) Proactive, energetic, innovative and change oriented Nice to have: GCP or Azure Bare metal infrastructure More ❯
but is not limited to: Architecting, building, and operating the core cloud-native infrastructure for WunderGraph Cosmo, primarily using Go and Kubernetes. Owning and evolving our observability stack (OpenTelemetry, Prometheus, ClickHouse) and the infrastructure supporting our AI-driven features to ensure deep, actionable insights into our systems. Building and optimizing CI/CD pipelines to improve build times, automate quality … architecture, distributed systems, and the challenges of running high-performance API gateways. Familiarity with GraphQL Federation is a significant plus. Experience building or managing modern observability stacks (e.g., OpenTelemetry, Prometheus, Grafana, ClickHouse). A self-starter attitude and a leader's mindset: you are comfortable with ambiguity, can identify and solve ill-defined problems, and don't need hand-holding. More ❯
new functionality Maintaining and evolving our cloud infrastructure (GCP, Kubernetes) to ensure high availability, security, and performance Managing service observability and reliability, including logging, metrics and alerting (we use Prometheus and Grafana) Handling database and service upgrades (e.g. MySQL, Kubernetes), secrets management and security best practices Taking ownership of platform-level concerns such as deployment pipelines, configuration management, and cost … best practices across infrastructure and applications, including secrets management and credential rotation. Familiarity with infrastructure-as-code or automation tools is a plus Experience with observability tools (such as Prometheus and Grafana), service monitoring, and debugging in production environments A demonstrated interest in staying up-to-date with new technology, new frameworks, new languages and other developments like AI. A More ❯
orchestration. Support multi-tenancy and environment rationalization to reduce duplication and inefficiency. Define and implement observability standards, including logging, metrics, tracing, and alerting . Use tools like New Relic , Prometheus , and Grafana , alongside building custom instrumentation for key platform services. Drive incident readiness and operational resilience by enabling actionable monitoring and alerting. Drive cloud cost visibility and optimization efforts across … and operating developer platforms and enablement frameworks. Experience with cloud-native technologies, Kubernetes, and Infrastructure as Code (Terraform, Helm, etc.). Strong understanding of observability tooling (especially New Relic, Prometheus, Grafana) and incident response best practices. Familiarity with FinOps, platform cost tracking, and infrastructure efficiency techniques. Excellent communication, leadership, and stakeholder management skills. Attract, hire, and develop talented platform engineers More ❯
This is an office based role , you must be able to commute to and work in the City of London as a norm About Us Archax is an FCA-regulated exchange, broker and custodian for digital assets, targeted at professional More ❯
Position Summary We are looking for an experienced Systems Engineer with strong Linux and Kubernetes experience to join our Group Engineering - Systems team. You will help design, build and operate modern infrastructure platforms that support continually evolving applications and services. More ❯
LSEG (London Stock Exchange Group) is more than a diversified global financial markets infrastructure and data business. We are dedicated, open-access partners with a dedication to excellence in delivering the services our customers expect from us. With extensive experience More ❯
About us We are Orbital an AI company on a mission to automate the legal segment of every property transaction in the world We iterate rapidly to build products that utilise the bleeding-edge AI models. Products that are powered More ❯
About us We are Orbital an AI company on a mission to automate the legal segment of every property transaction in the world We iterate rapidly to build products that utilise the bleeding-edge AI models. Products that are powered More ❯
Design and implement core components of the trading engine (execution models, risk controls, etc.). Optimize system latency and performance (Java-based backend). Extend monitoring and analytics tools (Prometheus, Grafana, Python). Guide and mentor a diverse team of developers (junior to senior levels). Collaborate with stakeholders across product, trading, and executive functions. Support production systems and manage … and performance tuning Excellent communication in English A balance of technical depth and emotional intelligence Tech Stack Languages: Java, Python Infrastructure & CI/CD: GitLab, CI pipelines Monitoring & Analytics: Prometheus, Grafana Database: Microsoft SQL Server What the company offers Competitive salary A collaborative, flat-structure environment with minimal bureaucracy Creative freedom and real ownership over the product Exposure to high More ❯
City of London, London, United Kingdom Hybrid / WFH Options
ECS
alerts, and service flow maps tailored to their needs. Write and optimize advanced DQL queries to deliver actionable insights from telemetry data. Support the transition from tools such as Prometheus, Grafana, and CloudWatch to Dynatrace. Recommend RBAC structures and data access models that align with organizational and security requirements. Assist in shaping observability strategies for Kubernetes workloads in hybrid (cloud … Language (DQL) queries. Strong understanding of hybrid infrastructure (cloud + on-prem) and modern application stack Solid understanding of Kubernetes, with experience deploying Dynatrace on Kubernetes environment Exposure to Prometheus, Grafana, and AWS CloudWatch monitoring tools. Further information available upon application. Please note, due to internal capabilities it will be difficult for us to take internal calls regarding your application More ❯
code-fixes. Job Duties • Prioritise and provide advanced troubleshooting of incidents escalated via ServiceDesk across a range of technologies: Internal software, MySQL, Instana, Loki, RabbitMQ, Linux & Windows OS, Splunk, Prometheus, Grafana. • Develop clear and concise internal troubleshooting documentation to streamline incident resolution, ensuring each guide includes step-by-step instructions, common error scenarios, and solutions tailored to our systems and … Service or recent relevant qualification. • Previous experience and/or understanding of Windows & Linux OS. • Experience with one or a number of the following monitoring tools: Instana, Splunk, Loki, Prometheus, Grafana. • Experience with Database technologies such as Mysql, MongoDb or Redis and the relevant query language. • Previous experience and/or understanding of cloud-based infrastructure (ideally AWS). • Operated More ❯
Senior Analytics Engineer Senior Analytics Engineer Apply remote type: Hybrid Locations: North London, UK Time type: Full time Posted on: Posted 18 Days Ago On average, it takes 5 minutes to apply for this role. Kick-start your career in More ❯