Infrastructure Observability Engineer - Leading Trading Company Location: London, UK Contract Type: Permanent Salary: Competitive + Benefits About Our Client Our client is a well-established trading company with a strong presence in the global commodities market. They are committed to leveraging cutting-edge technology solutions to drive operational excellence and maintain their competitive edge in the fast-paced trading environment. … The Role We are seeking an experienced Infrastructure Observability Engineer to lead the design, implementation, and continuous improvement of our client's enterprise observability platform. This role focuses on delivering comprehensive monitoring, event correlation, and impact analysis, demonstrating AIOps capabilities and tools such as BMC Helix Operations Manager. The ideal candidate will be passionate about improving access to infrastructure performance … automating operational intelligence, and reducing mean time to resolution (MTTR) through intelligent alerting and root cause analysis. Key Responsibilities Own and evolve the enterprise observability strategy across all infrastructure tracks Design, implement, and support event management and impact analysis workflows using platforms such as BMC Helix Operations Manager Integrate and correlate data from multiple sources (e.g., 20+ monitoring systems) into More ❯
Manchester, Lancashire, England, United Kingdom Hybrid / WFH Options
Opus Recruitment Solutions Ltd
based environments Troubleshooting complex issues across infrastructure, code, and security layers Collaborating with engineers, architects, and operational teams Maintaining systems in live environments and responding to incidents Ensuring security, observability, and scalability are embedded in solutions Key Skills & Experience Required Automation & Tooling CI/CD tools and pipeline development (e.g. Jenkins, GitLab, Bamboo) Infrastructure as Code (Terraform, Ansible, etc.) Orchestration … deployment Experience managing databases (SQL, NoSQL) Exposure to legacy infrastructure and modernisation approaches Performance tuning and workload scaling Ways of Working Agile and Scrum delivery methodologies SRE principles and observability practices Experience in mission-critical or highly regulated environments is advantageous Excellent problem-solving, communication, and collaboration skills What’s on Offer Highly flexible hybrid working model 25 days annual More ❯
Newcastle Upon Tyne, Tyne And Wear, United Kingdom
Strive Gaming
in between - ensuring our platform is resilient, efficient, secure and developer-friendly. Key Responsibilities: Design, build, and maintain platform services and infrastructure used by product engineering teams. Improve reliability, observability, and scalability of existing systems. Develop and maintain CI/CD pipelines to support software delivery. Build tooling and automation that supports self-service infrastructure and deployment. Ensure security best More ❯
technical leader responsible for the reliability, scalability, and security of the entire GEEIQ platform. You'll tackle our biggest infrastructure challenges, from scaling our Kubernetes clusters to maturing our observability stack and refining our deployment pipelines. We are looking for an experienced and pragmatic engineer who is passionate about building robust, automated, and secure systems. You will work alongside our … CD pipelines in GitHub Actions to make them faster, more reliable, and more secure. Champion developer productivity by building tools, automating workflows, and reducing friction in the development lifecycle. Observability & Reliability (SRE) Lead the charge on improving our observability strategy. Design and implement a robust monitoring, logging, and alerting framework using tools like Grafana, Prometheus, and native AWS services. Enhance … and security. Demonstrated ability to design, build, and significantly improve CI/CD pipelines, with specific experience in GitHub Actions. A strong track record of building out and improving observability stacks (monitoring, logging, tracing). Experience implementing security controls and working within compliance frameworks (experience with SOC2 is a major plus). Proven ability to mentor and collaborate with other More ❯
architectures across Azure, AWS, and Google Cloud Leading platform engineering squads using DevSecOps, Kubernetes, and automation tooling Enabling edge and private cloud capabilities (e.g., Azure Stack, AWS Outposts) Implementing observability and governance tooling to support modern operations Supporting Agile and product-based delivery using SRE, CI/CD, and Infrastructure as Code Advising clients on architecture optimisation, security, cost control More ❯
new infrastructure and services in line with internal security, operational, and performance standards Automate recurring tasks and develop tooling that improves visibility and consistency across environments Manage monitoring and observability tooling to ensure proactive incident response Participate in an on-call rota to support incident handling and resolution Produce high-quality documentation and technical diagrams Requirements: Strong experience administering Linux More ❯
automation, scalability, and high reliability. A strong working knowledge of Microsoft Azure is essential. The role involves daily coding, technical leadership across orchestration, CI/CD pipelines, cloud services, observability, and security-working alongside site reliability, onboarding, architecture, and delivery functions. You're expected to scale impact through others by upskilling team members, hiring where needed, and championing platform engineering More ❯
through coaching, recruitment, and career development aligned with DDaT frameworks. Excellent development skills, with a depth of experience including C#, Java (Spring Boot, JPA/Hibernate), REST API's, observability and monitoring, queue technologies and security. Detailed knowledge of best practices such as SOLID principles Experience of building new and evolving microservices with emphasis on high availability and data integrity. More ❯
through coaching, recruitment, and career development aligned with DDaT frameworks. Excellent development skills, with a depth of experience including C#, Java (Spring Boot, JPA/Hibernate), REST API's, observability and monitoring, queue technologies and security. Detailed knowledge of best practices such as SOLID principles Experience of building new and evolving microservices with emphasis on high availability and data integrity. More ❯
workflows. Implement robust monitoring, alerting, and incident response processes to maintain high levels of system reliability and uptime. Continuously assess and integrate new tools and technologies to enhance automation, observability, and scalability. Drive platform automation across provisioning, deployments, security controls, and operational workflows Proven experience in a DevOps or platform engineering role, ideally within a fast-paced or regulated environment. More ❯
Fi authentication systems, CRMs and partnered PropTech tools Continually hone and perfect our homegrown DevOps and CI/CD processes by further developing GitHub Actions pipelines, Terraform definitions and observability integrations. Ensure quality & reliability: establish testing best practices (unit, integration, end-to-end), conduct code reviews and demand high quality standards Shape and refine our cloud-native platform to optimise More ❯
such as Docker, ECS, or Kubernetes Solid programming skills in one or more languages (e.g., Java, Python, TypeScript) Experience in designing and implementing CI/CD pipelines Familiar with observability tools, logging frameworks, and performance monitoring Background in serverless technologies (e.g., Lambda, Step Functions, API Gateway) Experience with data tools like EMR, Glue, or Apache Spark Understanding of event-driven More ❯
Caldecotte, Milton Keynes, Buckinghamshire, England, United Kingdom
Connells Group HQ
day-to-day and strategic decision making.You will be a hands-on and customer focused engineering servant-leader. You will be comfortable moving across orchestration, automation, pipelines, cloud services, observability and security domains (even if you are not an expert in them all). A non-negotiable is experience and familiarity with Microsoft Azure.You will play your part in operating More ❯
technical considerations related to the rapid developments in tech Ensure high-quality code and best practices. Write clean, maintainable and efficient code and ensure code quality through TDD and observability practices Develop RESTful APIs using FastAPI and Pydantic Work with SQL and NoSQL databases, as well as ORM tools like SQLAlchemy and SQLModel Participate in Agile XP methodologies like pair More ❯
/IP, VLANs, routing). You will bring some of these skills, but more importantly you're interested in learning these things: • Hardware & physical infrastructure. • Data-driven monitoring and observability (Grafana, InfluxDB, Prometheus, Elastic). • Exposure to configuration management (Puppet, Ansible, Terraform). • Some exposure to scripting (Bash, Python). • Supporting CI/CD delivery pipelines (GitLab, GitHub). More ❯
well-funded start-up that is disrupting the global credit insurance industry, growing and developing with the company. Responsibilities: Maintain and improve existing infrastructure built on AWS ECS with observability tools in place Improve automation for deployments and infrastructure management Collaborate with development teams to streamline the CI/CD pipeline Maintain and enhance monitoring and alerting systems Monitor system More ❯
with Engineering Managers and Product Management to support the goals and objectives on your team. You will have a focus on end-to-end responsibility for the development, quality, observability, and testing of the software you build. Everyone is welcome. We have a culture of creativity. We approach our work passionately, improve constantly and celebrate our wins at every turn. More ❯
using modern, agile development practices like code review, TDD, CI/CD and pairing using tools like Git and GitHub. Experience of operationally managing software components once live, including; observability, logging, metrics, error reporting, debugging and live incident management. Experience of working with sensitive personal data. Competitive salary starting from £85,000 Generous Pension Scheme - We invest in your future More ❯
using modern, agile development practices like code review, TDD, CI/CD and pairing using tools like Git and GitHub. Experience of operationally managing software components once live, including; observability, logging, metrics, error reporting, debugging and live incident management. Experience of working with sensitive personal data. Competitive salary starting from £85,000 Generous Pension Scheme - We invest in your future More ❯
using modern, agile development practices like code review, TDD, CI/CD and pairing using tools like Git and GitHub. Experience of operationally managing software components once live, including; observability, logging, metrics, error reporting, debugging and live incident management. Experience of working with sensitive personal data. Competitive salary starting from 85,000 Generous Pension Scheme We invest in your future More ❯
influencing at all levels. A mindset focused on long-term sustainability and strategic technical thinking. Bonus Points For Fintech or regulated environment experience, particularly investment platforms. Familiarity with modern observability stacks and incident response processes. Experience with security-first architecture and data protection best practices. Why Join? Well-Backed & Ambitious: Backed by a globally recognised financial group with significant investment More ❯
influencing at all levels. A mindset focused on long-term sustainability and strategic technical thinking. Bonus Points For Fintech or regulated environment experience, particularly investment platforms. Familiarity with modern observability stacks and incident response processes. Experience with security-first architecture and data protection best practices. Why Join? Well-Backed & Ambitious: Backed by a globally recognised financial group with significant investment More ❯
solutions meet business needs. Experience with data ingestion tools, like Fivetran. Advantageous Exposure to deploying applications with Kubernetes. Experience with Data Orchestrator tools (Airflow, Prefect, etc.) Experience with Data Observability tools (Montecarlo, Great Expectations, etc.) Experience with Data Catalog tools (Amundsen, OpenMetadata, etc.) Interview Process Call with the talent team Take home task Tech interview CPTO interview Life at Lendable More ❯
Exposure to site reliability engineering: root cause analysis, in-production troubleshooting, on-call rotations ) • Exposure to infrastructure management: CI/CD, containerization, orchestration, infra-as-code, monitoring, logging, alerting, observability ). • Technical product mindset (e.g. understanding how to debug poor adoption). • Excellent problem-solving and communication skills (ability to contextualizing, gauging risks and getting buy-in for high stakes More ❯
with cross-functional stakeholders including the Data Platform team and Engineering teams. Design and maintain reliable, scalable cloud infrastructure (primarily AWS). Drive key initiatives involving container orchestration (Kubernetes), observability, security, and CI/CD. Establish best practices in platform engineering and foster a servant-leadership culture focused on empathy, empowerment, and collaboration. Work with your peers and colleagues at More ❯