Manchester, Lancashire, United Kingdom Hybrid / WFH Options
Gamma Communications plc
position will align to a discipline where you will be expected to build and support solutions aligned with SDLC principles, providing technical excellence with a focus on scripting and observability coupled with a security mindset. What will you be doing day-to-day? Automation and Orchestration: Streamline the delivery and support processes by leveraging automation and IaC principles. Support and More ❯
a bias for Infrastructure (Python, Go, C#) • IAM Policy and Authentication/Authorization schemes • Web Services and REST API • Databases and Storage Systems • Development Build, Test, and Deployment Pipelines • Observability and Monitoring (Open Telemetry, TIG and ELK stacks) #LI-JS2 Together, as owners, let's turn meaningful insights into action. Life at CGI is rooted in ownership, teamwork, respect and More ❯
focus on goals and the ability to balance multiple priorities in a fast-paced environment. DESIRED SKILLS AND EXPERIENCE: Real-time and low latency market data experience Service orchestration, observability and monitoring platform experience Solid understanding of a Programming Language (preferably Python) Agile tools (Jira, GIT among other DevOps principles) LSEG is a leading global financial markets infrastructure and data More ❯
patching, upgrades, security baselines, and hardware refresh planning. Maintain solutions and performance-tuned enterprise systems. DevOps, SRE & Automation Embrace an SRE mindset: treat infrastructure as code, prioritize availability and observability, and automate toil. Automate provisioning, compliance checks, and config enforcement Use GitHub for source control, peer-reviewed automation pipelines, change tracking, and documentation versioning. Contribute to CI/CD workflows More ❯
a developer-first environment through self-service tools, and lead initiatives that ensure platform reliability and performance. You'll play a critical part in empowerin engineeringg teams with automation, observability, and innovation. If you're passionate about technical leadership, growth, and building systems that support global scale, this is a fantastic opportunity to shape the projects, and optimize cloud spend More ❯
Bristol, Avon, South West, United Kingdom Hybrid / WFH Options
Hargreaves Lansdown
Excited to grow your career? Our purpose is to empower people to save and invest with confidence. We are looking for great people to join us, so please come and invest in YOUR future at HL. We know that sometimes More ❯
Complexio is Foundational AI workstoautomate business activities by ingesting whole company data- both structured andunstructured - and making sense of it. Usingproprietarymodels and algorithms Complexio forms adeepunderstanding ofhow humans are interacting and using it. Automation can then replicate and improve these More ❯
thousands of restaurant, grocery and convenience partners across the globe. About this role We are seeking a seasoned Principal Engineer to lead the design, development, and evolution of our Observability Platform , ensuring it meets the needs of our rapidly scaling systems and engineering teams. This role will also focus on leveraging Machine Learning (ML) and Artificial Intelligence (AI) to deliver … system health and drive down Mean Time to Detection (MTTD) and Mean Time to Resolution (MTTR) . The ideal candidate will be a visionary technologist with deep expertise in observability, monitoring, and distributed systems, capable of driving strategy, architecture, and execution for a world-class platform. These are some of the key ingredients to the role: Platform Leadership Architect, design … and implement a cutting-edge Observability Platform to support metrics, logs, traces, and events at scale. Integrate ML/AI-driven solutions to enhance anomaly detection, root cause analysis, and predictive insights. Lead the development and adoption of platform capabilities to ensure system health, reliability, and performance. Establish and evolve platform standards and best practices to align with the company More ❯
Chantilly, Virginia, United States Hybrid / WFH Options
Edgesource
opportunity to do meaningful, interesting, and impactful work. Position Overview: Edgesource is actively hiring for Software/DevOps Engineers who will be at the forefront of ensuring the reliability, observability, and operational excellence of our software systems. You will be on a team of other DevOps and Software Engineers who's goal is to build, manage, and maintain large-scale … Development: Design and Implementation of large scale software services. Preferred is the ability to do this in Python. Software Testing: Ability to write Unit/Integration/Load Testing. Observability and Monitoring: Oversee the design and implementation of observability solutions using technologies like Grafana and Prometheus. Ensure comprehensive monitoring, logging, and alerting for all services. Reliability and Performance: Ensure high … practices and conduct regular audits and assessments. Strategic Planning: Participate in strategic planning to define technical direction and achieve business objectives. Drive initiatives to enhance the reliability, scalability, and observability of our systems. Required Qualifications: Possess an active TS/SCI security clearance 5+ years of experience in software development or site reliability engineering 2+ years of experience as the More ❯
SRE team in the Platform Engineering organization and help us scale our products to millions of end-users. We are looking for individuals with a passion for automation and observability, ensuring the quality and availability of our services. Location - We are flexible on remote working from home, if you are based in the UK or Germany. This is a fully … our 24x7 on-call rotation, SCRUM, and deployment planning Perform Root Cause Analysis (RCA) and provide recommendations for application teams Improve availability and reduce customer impact using Industry best observability tools Ensure best-practice and security-minded architecture by influencing design decisions Create and maintain technical documentation and SOP's Develop software, scripts, or tooling to improve efficiency and reduce … time of applications and infrastructure Other duties as needed About You 5+ years' experience in Site Reliability Engineer roles Expert+ level Linux administration, scripting, and troubleshooting Demonstrable knowledge of Observability tools (Prometheus/Grafana, New Relic, Splunk, DataDog) Comprehensive experience with AWS (Amazon Web Services) and its core capabilities (VPC, EC2, ECS, Route53, Fargate, ALB/NLB distributions, etc) Extensive More ❯
SRE team in the Platform Engineering organization and help us scale our products to millions of end-users. We are looking for individuals with a passion for automation and observability, ensuring the quality and availability of our services. Location - We are flexible on remote working from home, if you are based in the UK or Germany. This is a fully … our 24x7 on-call rotation, SCRUM, and deployment planning Perform Root Cause Analysis (RCA) and provide recommendations for application teams Improve availability and reduce customer impact using Industry best observability tools Ensure best-practice and security-minded architecture by influencing design decisions Create and maintain technical documentation and SOP's Develop software, scripts, or tooling to improve efficiency and reduce … time of applications and infrastructure Other duties as needed About You 5+ years' experience in Site Reliability Engineer roles Expert+ level Linux administration, scripting, and troubleshooting Demonstrable knowledge of Observability tools (Prometheus/Grafana, New Relic, Splunk, DataDog) Comprehensive experience with AWS (Amazon Web Services) and its core capabilities (VPC, EC2, ECS, Route53, Fargate, ALB/NLB distributions, etc) Extensive More ❯
service mesh solutions across our distributed systems. In this role, you will lead the design and operation of Kong Mesh (based on Kuma) for managing microservices communication, security, and observability at scale. You’ll play a crucial role in defining service-to-service architecture and ensuring platform reliability, scalability, and security. Key Responsibilities: • Lead the design and deployment of Kong … Mesh across our environments (on-prem and cloud). • Define and enforce best practices for service mesh architecture, traffic routing, zero-trust security, observability, and policy enforcement. • Collaborate with infrastructure, security, and development teams to integrate Kong Mesh with CI/CD, monitoring, and logging solutions. • Develop custom policies, plugins, and automation scripts to enhance Kong Mesh capabilities. • Monitor mesh More ❯
service mesh solutions across our distributed systems. In this role, you will lead the design and operation of Kong Mesh (based on Kuma) for managing microservices communication, security, and observability at scale. You’ll play a crucial role in defining service-to-service architecture and ensuring platform reliability, scalability, and security. Key Responsibilities: • Lead the design and deployment of Kong … Mesh across our environments (on-prem and cloud). • Define and enforce best practices for service mesh architecture, traffic routing, zero-trust security, observability, and policy enforcement. • Collaborate with infrastructure, security, and development teams to integrate Kong Mesh with CI/CD, monitoring, and logging solutions. • Develop custom policies, plugins, and automation scripts to enhance Kong Mesh capabilities. • Monitor mesh More ❯
a key member of the Dynatrace sales engine and will be responsible for providing excellent technical support to the sales team. You will be the expert on Dynatrace and observability, with a specialization in Log Management and Analytics. Within this exciting role, you will be responsible for executing great demos which demonstrate the Dynatrace unique approach in solving the customer … be filled at a higher level based on candidate experience. What will help you succeed Preferred Requirements: Experience with query languages such as SQL, SPL, or KQL. Experience with observability and log collectors/pipelines such as FluentBit, OpenTelemetry, Cribl, and Logstash. Experience with web technologies such as HTML, CSS, and JavaScript. Experience with programming/scripting side technologies such … OpenShift, Serverless functions, and CI/CD pipelines. Experience with automation like Ansible, Puppet, Terraform, etc. Why you will love being a Dynatracer Dynatrace is a leader in unified observability and security. We provide a culture of excellence with competitive compensation packages designed to recognize and reward performance. Our employees work with the largest cloud providers, including AWS, Microsoft, and More ❯
Bristol, Gloucestershire, United Kingdom Hybrid / WFH Options
TwinStream
services. You will be working with multiple feature development teams and the BAU/Support team to define and evolve our cloud & on-prem infrastructure & delivery pipelines, improving system observability, demonstrating performance and capacity improvements and proactively identifying and mitigating reliability risks. Key Responsibilities of the Site Reliability Engineer: Collaborate with Software Engineers to improve reliability and performance in their … subsystems Partner with System Administrators in automating toil and eliminating alerts Evolve observability and monitoring capabilities to identify and solve problems before they impact the business Support development environments to help us achieve our delivery and quality goals Research and evaluate technologies, tools and services to influence buy-vs-build decisions Develop expertise in diverse technical and business domains Expand … in one of our platform languages (Java, Go, Python or similar) Knowledge of cross domain principles & technologies Experience of working in a service management environment Practical applications of using observability patterns in previous systems Creating and monitoring system availability metrics and using those to drive work that reduces downtime There are many great reasons to join our team! Pension Plan More ❯
functions, championing a culture of proactive readiness, efficient release pipelines, robust incident response, and continuous infrastructure improvement. This role ensures maximum uptime, enables safe and frequent deployments, establishes comprehensive observability, and drives effective postmortem practices. They will work closely with Engineering, QA, and Security leadership to embed operational excellence across the software development lifecycle and support the platform’s growth … distributed team of DevOps engineers, SREs, and incident responders; Foster a culture of ownership, continuous improvement, and operational excellence; Define and execute the long-term strategy for system reliability, observability, performance, and incident management; Champion the adoption of modern tooling, technologies, and best practices to enhance resilience and agility; Own and continuously evolve incident response processes, including SLOs, SLAs, and More ❯
strategy across FIC Technology, aligning reliability goals with business priorities and regulatory expectations Lead the transformation of production support into a proactive, data-driven engineering discipline focused on automation, observability, and continuous improvement Stay close to the technology—reviewing architecture, contributing to tooling, and leading by example in incident response and root cause analysis Act as a trusted advisor to … proficiency in Linux/Unix systems, SQL, and programming languages such as C++, Java or Python. Strong understanding of distributed systems and low-latency architectures Hands-on experience with observability stacks (e.g., Prometheus, Grafana, Splunk, Geneos, OpenTelemetry) and infrastructure automation (e.g., Ansible, Terraform, CI/CD pipelines) Strong understanding of the trade lifecycle, market data, and fixed income products, FX More ❯
Arlington, Virginia, United States Hybrid / WFH Options
Boeing
closely with software engineers, cloud and infrastructure operations, cybersecurity teams, and various tool vendors to build and maintain scalable, secure, and efficient runtime environments, CI/CD pipelines, and observability tooling that empower Boeing engineering teams throughout all phases of the certified Software Development Lifecycle (SDLC). Our teams are currently hiring for a broad range of experience levels including … critical software development. Automates deployment processes, monitors, troubleshoots, and optimizes the performance of infrastructure, ensuring seamless integration of all tools into the CI/CD pipelines. Implements monitoring and observability solutions to ensure the reliability and performance of environments. Collaborates with cybersecurity and infrastructure teams to ensure adherence to security best practices and compliance requirements. Analyzes, plans, and executes mitigation More ❯
in the team Contribute to solution architecture and strategic technical direction Build, integrate, and maintain REST APIs and backend services Champion best practices in software quality, CI/CD, observability, and DevOps Collaborate with cross-functional teams including Product, QA, and DevOps Optionally take on people management responsibilities for engineers Stay updated with emerging backend and cloud technologies Key Skills More ❯
in the team Contribute to solution architecture and strategic technical direction Build, integrate, and maintain REST APIs and backend services Champion best practices in software quality, CI/CD, observability, and DevOps Collaborate with cross-functional teams including Product, QA, and DevOps Optionally take on people management responsibilities for engineers Stay updated with emerging backend and cloud technologies Key Skills More ❯
Leeds, West Yorkshire, Yorkshire, United Kingdom Hybrid / WFH Options
Halian Technology Limited
in the team Contribute to solution architecture and strategic technical direction Build, integrate, and maintain REST APIs and backend services Champion best practices in software quality, CI/CD, observability, and DevOps Collaborate with cross-functional teams including Product, QA, and DevOps Optionally take on people management responsibilities for engineers Stay updated with emerging backend and cloud technologies Key Skills More ❯
Birmingham, West Midlands, United Kingdom Hybrid / WFH Options
Halian Technology Limited
in the team Contribute to solution architecture and strategic technical direction Build, integrate, and maintain REST APIs and backend services Champion best practices in software quality, CI/CD, observability, and DevOps Collaborate with cross-functional teams including Product, QA, and DevOps Optionally take on people management responsibilities for engineers Stay updated with emerging backend and cloud technologies Key Skills More ❯
Manchester, North West, United Kingdom Hybrid / WFH Options
Halian Technology Limited
in the team Contribute to solution architecture and strategic technical direction Build, integrate, and maintain REST APIs and backend services Champion best practices in software quality, CI/CD, observability, and DevOps Collaborate with cross-functional teams including Product, QA, and DevOps Optionally take on people management responsibilities for engineers Stay updated with emerging backend and cloud technologies Key Skills More ❯
Bristol, Avon, South West, United Kingdom Hybrid / WFH Options
Halian Technology Limited
in the team Contribute to solution architecture and strategic technical direction Build, integrate, and maintain REST APIs and backend services Champion best practices in software quality, CI/CD, observability, and DevOps Collaborate with cross-functional teams including Product, QA, and DevOps Optionally take on people management responsibilities for engineers Stay updated with emerging backend and cloud technologies Key Skills More ❯
AWS and Azure. Build and optimize CI/CD pipelines using Azure DevOps, GitHub Actions, or Jenkins. Automate everything with Terraform, Bicep, and scripting (PowerShell, Bash, Python). Drive observability with tools like Datadog, LogicMonitor, CloudWatch, and Grafana. Champion cloud security, IAM, RBAC, and compliance best practices. Collaborate across teams, mentor peers, and contribute to a culture of continuous improvement. More ❯