You’ll partner closely with engineering and support teams to evolve infrastructure, automate processes, and proactively eliminate downtime. Expect to: Improve system performance and observability across cloud and on-prem environments Automate repetitive tasks and reduce alert fatigue Strengthen our CI/CD pipelines using modern tooling Research and implement … Skills & Experience: Configuration management (Ansible, Chef, etc.) Terraform & Infrastructure-as-Code Docker containers and orchestration (Kubernetes, OpenShift, etc.) CI/CD tools like Jenkins Observability tools (Grafana, Prometheus, InfluxDB) MQ messaging systems (RabbitMQ or similar) Strong Linux skills, shell scripting, and networking fundamentals Experience with AWS (EC2, S3, RDS, Lambda More ❯
Bristol, Avon, South West, United Kingdom Hybrid / WFH Options
Twinstream Limited
You'll partner closely with engineering and support teams to evolve infrastructure, automate processes, and proactively eliminate downtime. Expect to: Improve system performance and observability across cloud and on-prem environments Automate repetitive tasks and reduce alert fatigue Strengthen our CI/CD pipelines using modern tooling Research and implement … Skills & Experience: Configuration management (Ansible, Chef, etc.) Terraform & Infrastructure-as-Code Docker containers and orchestration (Kubernetes, OpenShift, etc.) CI/CD tools like Jenkins Observability tools (Grafana, Prometheus, InfluxDB) MQ messaging systems (RabbitMQ or similar) Strong Linux skills, shell scripting, and networking fundamentals Experience with AWS (EC2, S3, RDS, Lambda More ❯
and mitigate technical debt , ensuring a balance between quick wins and long-term stability . Oversee the refactoring and modernization of legacy applications , ensuring observability, performance, and security . Collaborate with Chief Engineers and engineering teams to define best practices for ETL, data integration, and cloud adoption . Ensure engineering … related tools. Strategic Thinking – Ability to drive long-term engineering strategy while delivering incremental value. Technical Debt Management – Experience identifying and remediating inefficient architectures. Observability & Performance Optimization – Familiarity with monitoring and logging tools (e.g., Datadog, Splunk, Prometheus, New Relic). Stakeholder Management – Ability to engage with senior leadership, product managers More ❯
london, south east england, United Kingdom Hybrid / WFH Options
MarkJames Search
Job Title: Site Reliability Engineering (SRE) Lead – Observability Location: Stratford, London (Hybrid – 2 days per week onsite) Contract Length: 6 months Rate: £450–£500 per day (Inside IR35) Industry: Financial Services A leading Financial Services organisation in London is seeking a Site Reliability Engineering (SRE) Lead – Observability to join their … hybrid role requiring two days per week onsite at their Stratford, London offices. The role sits Inside IR35 . Key Responsibilities: Lead the SRE Observability team and champion observability practices across multiple product groups. Provide thought leadership from the Cognizant delivery team on all things SRE. Leverage hands-on experience … with Datadog to implement and enhance observability capabilities. Guide and oversee the day-to-day operation and maintenance of observability tools. Partner directly with engineering teams to support delivery of observability backlogs. Collaborate with product teams to create monitoring and alerting blueprints, patterns, and automation. Capture, analyse, and report on More ❯
Leeds, West Yorkshire, Yorkshire, United Kingdom Hybrid / WFH Options
Stott & May Professional Search Limited
Senior DevOps Engineer (SC Cleared) *candidates must hold an active SC Clearance* Start: ASAP Duration: 6-12 months Pay: negotiable, INSIDE IR35 Location: Hybrid (Leeds) Overview As a Senior DevOps Engineer, you will: - Work collaboratively in agile teams delivering services More ❯
bradford, yorkshire and the humber, united kingdom Hybrid / WFH Options
Stott & May Professional Search Limited
Senior DevOps Engineer (SC Cleared) *candidates must hold an active SC Clearance* Start: ASAP Duration: 6-12 months Pay: negotiable, INSIDE IR35 Location: Hybrid (Leeds) Overview As a Senior DevOps Engineer, you will: - Work collaboratively in agile teams delivering services More ❯
Senior MLops (Full Stack) Engineer London Foundation Models Job details Posted 30 April 2025 Salary £80,000 - £110,000 per annum Benefits: Equity Location: London Job type: Permanent Discipline: AI/Machine Learning Reference: BK-45-1 What you'll More ❯
East Riding, Yorkshire, United Kingdom Hybrid / WFH Options
Cathcart Technology
System Administrator Location: York (Hybrid - 1 day in the office per week) Hours: 37.5 per week, flexitime available Salary: Up to 55k - depending on experience About the Role: I'm looking for an experienced System Administrator to join a well More ❯
Platform Engineer Location: Leeds Salary: Competitive salary & Package (Depending on level of experience) Please Note: Any offer of employment is subject to satisfactory BPSS and SC security clearance which requires 5 years continuous UK address history at the point of More ❯
A level of Network understanding Awareness of different monitoring tools and protocols used for monitoring (eg. Open Telemetry, SNMP, Netconf) An understanding of what Observability is, and how a company can utilise it. Previous experience of using IT Service management (ITSM) tools like Remedy or ServiceNow. Understands how various business More ❯
Buckfastleigh, Devon, South West, United Kingdom Hybrid / WFH Options
Riverford Organic Farmers
and write pragmatic, maintainable code. Think pragmatically about trade-offs between speed, technical quality, and user outcomes. Understand and apply good engineering practices (testing, observability, maintainability). Take ownership of your work, share knowledge, and help raise standards across the team. Enjoy balancing incremental improvements with bigger-picture product and More ❯
delivery of the Mastek ServiceNow team’s backlog, and 2) leading hands-on innovation initiatives to improve service operations tooling. Your expertise in ServiceNow, observability, alerting, and AWS-related tools will help drive operational excellence and continuous improvement. Key Responsibilities Platform Development & Configuration: Design, develop, and implement innovative solutions across … ServiceNow, observability, alerting, and AWS-related tooling, including custom applications, integrations, AI & flows. Develop applications and integrations across platforms such as ITSM, ITOM, PA, CSM, SPM, CSDM, CMDB, Employee Centre, Integration Hub, and observability tools (e.g., Datadog, Splunk, AWS CloudWatch, Prometheus, etc.). Ensure seamless interoperability between service operations tooling … ensuring integration and automation opportunities are maximised. Innovation & Continuous Improvement: Stay current with industry best practices and emerging technologies in service operations tooling, including observability and alerting. Drive innovation by integrating and optimising monitoring solutions within service management workflows. Develop proof-of-concept (POC) solutions for new features and capabilities More ❯
the heart of a high-impact engineering transformation. You’ll combine core software engineering skills with Site Reliability Engineering (SRE) principles to deliver automation, observability, and resilience across our systems. What You’ll Do 🚢 Lead migration efforts for services running in Kubernetes, ensuring smooth rollouts with zero-downtime strategies. 🧠 Design … environments (AWS + Kubernetes). 🔧 Automate processes to reduce toil and accelerate delivery using Infrastructure-as-Code and CI/CD best practices. 📊 Implement observability through enhanced logging, metrics, and alerts to maintain service health throughout migration. 🔍 Troubleshoot complex systems and lead incident response, root cause analysis, and iterative improvements. More ❯
risks. Proactively reducing Mean Time to Resolution (MTTR), constantly striving for efficiency gains. Championing an anti-fragility mindset across our architecture, deployment processes, and observability practices. Elevating the customer experience as the ultimate benchmark of our reliability standards. Sharing industry best practices in SRE, ensuring our team remains at the … cloud networking, microservices architecture, and Amazon EKS. Preferred qualifications include: Prior involvement in the Fintech sector or other regulated industries. Familiarity with the Grafana observability stack. Experience in Chaos Engineering methodologies. About Convera Convera is the largest non-bank B2B cross-border payments company in the world. Formerly Western Union More ❯
risks. Proactively reducing Mean Time to Resolution (MTTR), constantly striving for efficiency gains. Championing an anti-fragility mindset across our architecture, deployment processes, and observability practices. Elevating the customer experience as the ultimate benchmark of our reliability standards. Sharing industry best practices in SRE, ensuring our team remains at the … cloud networking, microservices architecture, and Amazon EKS. Preferred qualifications include: Prior involvement in the Fintech sector or other regulated industries. Familiarity with the Grafana observability stack. Experience in Chaos Engineering methodologies. Your expertise will be instrumental in fortifying our infrastructure and delivering exceptional reliability to our customers. About Convera Convera More ❯
and help us maintain our hosting platform Creating and improving routes to live with automation including blue/green & canary strategies Configure and improve observability controls Proving scalability/resilience and security controls What you'll need To display broad and deep technical experience with a passion for engineering excellence … experience configuring & running production workloads in Kubernetes CI/CD & IaC tools like Jenkins, Terraform, Sonar, Nexus, Git, Spinnaker, Harness Strong understanding & experience of Observability, SRE, DevSecOps & FinOps Good understanding of cloud networking & connectivity patterns Good understanding of key data tooling such as Kafka, BigTable, DataProc, BigQuery etc It would More ❯
Northern Ireland, United Kingdom Hybrid / WFH Options
Ocho
Serverless, and S3 for cloud-native data solutions • Collaborate with frontend (Vue.js) and data platform engineers (Snowflake, Airflow) • Contribute to CI/CD pipelines, observability, and day-2 operational tooling • Partner with product and architecture teams to translate customer needs into working systems • Engage in cross-regional collaboration (US, EMEA … S3 • Experience with event-driven design using SQS, SNS, EventBridge • Comfortable working in containerized and serverless contexts (12-factor apps) • Hands-on experience with observability stacks: metrics, traces, logs • Strong communicator able to interface confidently with both technical and non-technical audiences Bonus Experience • Familiarity with IaC frameworks (CloudFormation, Terraform More ❯
internal associates and preferred third party vendors) in applying Site Reliability Engineering principles to in-house developed applications. Optimise and reduce operational overheads through observability and service automation. Identify growth opportunities for your manager level reportees on how to achieve their technical, business and personal goals. Work closely with peer … passion for software engineering and operational processes. Strong background in software/system engineering and architecture within the cloud. Strong background/appreciation in observability principles, techniques and toolsets. Demonstrable knowledge in the software development lifecycle within a cloud based environment. Demonstrable knowledge of developing and managing RESTful API services More ❯
Bristol, Gloucestershire, United Kingdom Hybrid / WFH Options
Duel
executed effectively. Run effective planning rituals including regular sprint planning, standups, and retrospectives. Build and evolve the platform foundations-from shared services to security, observability, and tooling-that enable product teams to move fast and stay safe. Drive platform improvements to support scale, availability, and performance as our business grows. … engineering experience, ideally with TypeScript in production environments. You're comfortable working with datastores like MongoDB (Atlas), ElasticSearch, and Snowflake. You understand distributed systems, observability best practices, and modern CI/CD workflows. You've built internal platforms or reusable services that support multiple teams or squads. You have a More ❯
Please Note: Any offer of employment is subject to satisfactory BPSS and SC security clearance which requires 5 years continuous UK address history (typically including no periods of 30 consecutive days or more spent outside of the UK). Accenture More ❯
leeds, west yorkshire, yorkshire and the humber, United Kingdom
Accenture
Please Note: Any offer of employment is subject SC clearance eligibility which requires 5 years continuous UK address history (typically including no periods of 30 consecutive days or more spent outside of the UK). Accenture is a leading global More ❯
Leeds, Yorkshire, United Kingdom Hybrid / WFH Options
William Hill PLC
Our team is building the next generation Sports Betting platform that optimizes flexibility, performance, responsiveness and resiliency. The technologies we like to use include Java, SpringBoot, Kafka, Cassandra, Postgres, Kubernetes, AWS, Postgres, etc. We are looking for an experienced Java More ❯
Airflow, gRPC, New Relic, Databricks, and more. This role requires expertise in distributed systems, microservices, and data pipelines, combined with a strong focus on observability and the ability to leverage vendor technologies to deliver impactful solutions. While this is not an ML development role, familiarity with the machine learning lifecycle … Vendor Integration : Identify and leverage vendor capabilities (e.g., AWS, Databricks, and other cloud services) to deliver high-quality solutions that align with organizational goals. Observability Solutions : Develop monitoring and observability systems to track model performance, detect anomalies, and ensure outputs align with business and ethical standards. Collaboration with Specialists : Work … Complementary skills for this role Technical Expertise : Extensive experience with distributed systems engineering, including designing and implementing Java-based microservices and Python batch jobs. Observability Knowledge : Deep understanding of observability principles, including monitoring, logging, and real-time system insights Data Engineering Skills : Proficiency in building data pipelines using PySpark and More ❯
Manchester Area, United Kingdom Hybrid / WFH Options
bet365
Who we are looking for A Site Reliability Engineer, who will enhance system reliability, observability and performance through a strong engineering approach and assist with incident resolution and best practices. You will have software engineering skills, focusing on system reliability and observability. You will monitor the health, performance and availability … maintainability. You will also help engineer tools and automation for effective service management. Collaboration is key, working across multiple functions to integrate reliability and observability best practices into the software development life cycle. By supporting governance standards set by the central teams, you will foster a culture where these principles … including the creation and management of effective Service Level Indicators (SLI) and Service Level Objectives (SLO) for reliability and customer satisfaction. Knowledge of contemporary observability tools, techniques and best practice including Splunk, New Relic, Grafana and Pager Duty. Excellent knowledge of programming languages including Python, Golang and JavaScript. Knowledge and More ❯
Stoke-On-Trent, England, United Kingdom Hybrid / WFH Options
bet365
Who we are looking for A Site Reliability Engineer, who will enhance system reliability, observability and performance through a strong engineering approach and assist with incident resolution and best practices. You will have software engineering skills, focusing on system reliability and observability. You will monitor the health, performance and availability … maintainability. You will also help engineer tools and automation for effective service management. Collaboration is key, working across multiple functions to integrate reliability and observability best practices into the software development life cycle. By supporting governance standards set by the central teams, you will foster a culture where these principles … including the creation and management of effective Service Level Indicators (SLI) and Service Level Objectives (SLO) for reliability and customer satisfaction. Knowledge of contemporary observability tools, techniques and best practice including Splunk, New Relic, Grafana and Pager Duty. Excellent knowledge of programming languages including Python, Golang and JavaScript. Knowledge and More ❯