by 100x to open bigger enterprise opportunities ensure fast queries, high availability , and low - latency data processing across the platform drive best practices around observability , ensuring data integrity and service uptime 💥 They're seeking someone with: a track record designing and building complex , cloud - native SaaS platforms confidence developing large More ❯
software deployment and scalability. CI/CD Expertise: Automate software build, test, and deployment pipelines following agile methodologies. Terraform Exposure: Beneficial experience with Terraform. Observability Tools: Experience with Grafana and Splunk is beneficial, particularly in developing and applying an observability strategy across a large organization. Learn More For more information More ❯
london, south east england, United Kingdom Hybrid / WFH Options
Harrington Starr
and help build the next generation of scalable, cloud-native infrastructure. This role sits in a high-impact platform engineering team focused on automation, observability, and empowering development teams to ship faster and more securely. Why You Should Apply: Work with a forward-thinking, global financial firm Hybrid setup … and maintaining AWS-based infrastructure using Terraform Improving CI/CD pipelines with Python and Git workflows Supporting containerised environments (Docker/K8s) Driving observability with Grafana and proactive monitoring tools Enhancing developer experience through smart automation and tooling What We’re Looking For: 3 years of experience in Platform More ❯
ensuring the platform is stable To drive and own the Monitoring strategy, defining clear goals, objectives, and deliverables. Optimise and reduce operational overheads through observability and service automation. Lead the definition and track Service Level Objectives (SLO) to measure service availability in combination with service, product and engineering communities. Collaborate … to prioritize and manage multiple tasks in a fast-paced environment. Experience in software development, infrastructure, or operations roles Strong background/appreciation in observability principles, techniques and toolsets. Demonstrable knowledge of developing and managing RESTful API services written within a modern OO language such as Java or Python Knowledge … C# Understand or worked within an Incident Management Process (ITSM) Desirable Requirements: AWS Linux - Debian, CentOS, Alpine and AWS Linux Terraform, Docker, Kubernetes, Git Observability/APM Platforms Jenkins, Nginx, MySQL Benefits We are actively committed to promoting a fully diverse and inclusive workforce and we welcome applications for this More ❯
You’ll partner closely with engineering and support teams to evolve infrastructure, automate processes, and proactively eliminate downtime. Expect to: Improve system performance and observability across cloud and on-prem environments Automate repetitive tasks and reduce alert fatigue Strengthen our CI/CD pipelines using modern tooling Research and implement … Skills & Experience: Configuration management (Ansible, Chef, etc.) Terraform & Infrastructure-as-Code Docker containers and orchestration (Kubernetes, OpenShift, etc.) CI/CD tools like Jenkins Observability tools (Grafana, Prometheus, InfluxDB) MQ messaging systems (RabbitMQ or similar) Strong Linux skills, shell scripting, and networking fundamentals Experience with AWS (EC2, S3, RDS, Lambda More ❯
Bristol, Avon, South West, United Kingdom Hybrid / WFH Options
Twinstream Limited
You'll partner closely with engineering and support teams to evolve infrastructure, automate processes, and proactively eliminate downtime. Expect to: Improve system performance and observability across cloud and on-prem environments Automate repetitive tasks and reduce alert fatigue Strengthen our CI/CD pipelines using modern tooling Research and implement … Skills & Experience: Configuration management (Ansible, Chef, etc.) Terraform & Infrastructure-as-Code Docker containers and orchestration (Kubernetes, OpenShift, etc.) CI/CD tools like Jenkins Observability tools (Grafana, Prometheus, InfluxDB) MQ messaging systems (RabbitMQ or similar) Strong Linux skills, shell scripting, and networking fundamentals Experience with AWS (EC2, S3, RDS, Lambda More ❯
and mitigate technical debt , ensuring a balance between quick wins and long-term stability . Oversee the refactoring and modernization of legacy applications , ensuring observability, performance, and security . Collaborate with Chief Engineers and engineering teams to define best practices for ETL, data integration, and cloud adoption . Ensure engineering … related tools. Strategic Thinking – Ability to drive long-term engineering strategy while delivering incremental value. Technical Debt Management – Experience identifying and remediating inefficient architectures. Observability & Performance Optimization – Familiarity with monitoring and logging tools (e.g., Datadog, Splunk, Prometheus, New Relic). Stakeholder Management – Ability to engage with senior leadership, product managers More ❯
the DevOps group. Spearhead the development, integration, and maintenance of CI/CD data pipelines for automated deployments. Integrate best practices for monitoring and observability to proactively detect, analyse, and resolve issues. Enforce robust data governance and security protocols through tools like Azure Key Vault, ensuring compliance with standards such … for automation. Strong aptitude for data pipeline monitoring and an understanding of data security practices such as RBAC and encryption. Implemented data and pipeline observability dashboards, ensuring high data quality, and improving the efficiency of data workflows. Experience ensuring compliance with regulatory frameworks and implementing robust data governance measures. Demonstrated More ❯
london, south east england, United Kingdom Hybrid / WFH Options
MarkJames Search
Job Title: Site Reliability Engineering (SRE) Lead – Observability Location: Stratford, London (Hybrid – 2 days per week onsite) Contract Length: 6 months Rate: £450–£500 per day (Inside IR35) Industry: Financial Services A leading Financial Services organisation in London is seeking a Site Reliability Engineering (SRE) Lead – Observability to join their … hybrid role requiring two days per week onsite at their Stratford, London offices. The role sits Inside IR35 . Key Responsibilities: Lead the SRE Observability team and champion observability practices across multiple product groups. Provide thought leadership from the Cognizant delivery team on all things SRE. Leverage hands-on experience … with Datadog to implement and enhance observability capabilities. Guide and oversee the day-to-day operation and maintenance of observability tools. Partner directly with engineering teams to support delivery of observability backlogs. Collaborate with product teams to create monitoring and alerting blueprints, patterns, and automation. Capture, analyse, and report on More ❯
Leeds, West Yorkshire, Yorkshire, United Kingdom Hybrid / WFH Options
Stott & May Professional Search Limited
Senior DevOps Engineer (SC Cleared) *candidates must hold an active SC Clearance* Start: ASAP Duration: 6-12 months Pay: negotiable, INSIDE IR35 Location: Hybrid (Leeds) Overview As a Senior DevOps Engineer, you will: - Work collaboratively in agile teams delivering services More ❯
East Riding, Yorkshire, United Kingdom Hybrid / WFH Options
Cathcart Technology
System Administrator Location: York (Hybrid - 1 day in the office per week) Hours: 37.5 per week, flexitime available Salary: Up to 55k - depending on experience About the Role: I'm looking for an experienced System Administrator to join a well More ❯
Platform Engineer Location: Leeds Salary: Competitive salary & Package (Depending on level of experience) Please Note: Any offer of employment is subject to satisfactory BPSS and SC security clearance which requires 5 years continuous UK address history at the point of More ❯
A level of Network understanding Awareness of different monitoring tools and protocols used for monitoring (eg. Open Telemetry, SNMP, Netconf) An understanding of what Observability is, and how a company can utilise it. Previous experience of using IT Service management (ITSM) tools like Remedy or ServiceNow. Understands how various business More ❯
A level of Network understanding Awareness of different monitoring tools and protocols used for monitoring (eg. Open Telemetry, SNMP, Netconf) An understanding of what Observability is, and how a company can utilise it. Previous experience of using IT Service management (ITSM) tools like Remedy or ServiceNow. Understands how various business More ❯
Buckfastleigh, Devon, South West, United Kingdom Hybrid / WFH Options
Riverford Organic Farmers
and write pragmatic, maintainable code. Think pragmatically about trade-offs between speed, technical quality, and user outcomes. Understand and apply good engineering practices (testing, observability, maintainability). Take ownership of your work, share knowledge, and help raise standards across the team. Enjoy balancing incremental improvements with bigger-picture product and More ❯
delivery of the Mastek ServiceNow team’s backlog, and 2) leading hands-on innovation initiatives to improve service operations tooling. Your expertise in ServiceNow, observability, alerting, and AWS-related tools will help drive operational excellence and continuous improvement. Key Responsibilities Platform Development & Configuration: Design, develop, and implement innovative solutions across … ServiceNow, observability, alerting, and AWS-related tooling, including custom applications, integrations, AI & flows. Develop applications and integrations across platforms such as ITSM, ITOM, PA, CSM, SPM, CSDM, CMDB, Employee Centre, Integration Hub, and observability tools (e.g., Datadog, Splunk, AWS CloudWatch, Prometheus, etc.). Ensure seamless interoperability between service operations tooling … ensuring integration and automation opportunities are maximised. Innovation & Continuous Improvement: Stay current with industry best practices and emerging technologies in service operations tooling, including observability and alerting. Drive innovation by integrating and optimising monitoring solutions within service management workflows. Develop proof-of-concept (POC) solutions for new features and capabilities More ❯
leeds, west yorkshire, yorkshire and the humber, United Kingdom
Mastek
delivery of the Mastek ServiceNow team’s backlog, and 2) leading hands-on innovation initiatives to improve service operations tooling. Your expertise in ServiceNow, observability, alerting, and AWS-related tools will help drive operational excellence and continuous improvement. Key Responsibilities Platform Development & Configuration: Design, develop, and implement innovative solutions across … ServiceNow, observability, alerting, and AWS-related tooling, including custom applications, integrations, AI & flows. Develop applications and integrations across platforms such as ITSM, ITOM, PA, CSM, SPM, CSDM, CMDB, Employee Centre, Integration Hub, and observability tools (e.g., Datadog, Splunk, AWS CloudWatch, Prometheus, etc.). Ensure seamless interoperability between service operations tooling … ensuring integration and automation opportunities are maximised. Innovation & Continuous Improvement: Stay current with industry best practices and emerging technologies in service operations tooling, including observability and alerting. Drive innovation by integrating and optimising monitoring solutions within service management workflows. Develop proof-of-concept (POC) solutions for new features and capabilities More ❯
risks. Proactively reducing Mean Time to Resolution (MTTR), constantly striving for efficiency gains. Championing an anti-fragility mindset across our architecture, deployment processes, and observability practices. Elevating the customer experience as the ultimate benchmark of our reliability standards. Sharing industry best practices in SRE, ensuring our team remains at the … cloud networking, microservices architecture, and Amazon EKS. Preferred qualifications include: Prior involvement in the Fintech sector or other regulated industries. Familiarity with the Grafana observability stack. Experience in Chaos Engineering methodologies. About Convera Convera is the largest non-bank B2B cross-border payments company in the world. Formerly Western Union More ❯
risks. Proactively reducing Mean Time to Resolution (MTTR), constantly striving for efficiency gains. Championing an anti-fragility mindset across our architecture, deployment processes, and observability practices. Elevating the customer experience as the ultimate benchmark of our reliability standards. Sharing industry best practices in SRE, ensuring our team remains at the … cloud networking, microservices architecture, and Amazon EKS. Preferred qualifications include: Prior involvement in the Fintech sector or other regulated industries. Familiarity with the Grafana observability stack. Experience in Chaos Engineering methodologies. Your expertise will be instrumental in fortifying our infrastructure and delivering exceptional reliability to our customers. About Convera Convera More ❯
of proven expertise, the ideal candidate will shape the strategy, design, and transformation of complex infrastructure landscapes—including Wintel, Linux, Network, Voice, Collaboration, Mobility, Observability, End-User Computing, End-User Services, and Service Desk. This role acts as a key advisor to senior leadership and ensures that infrastructure investments align … LAN/WAN/SD-WAN, Wireless, Firewalls) Unified Communication/Voice/Collaboration (Cisco, MS Teams) Mobility & Endpoint Management (Intune, MDM/UEM) Observability and Monitoring (ELK, Prometheus, AppDynamics, etc.) End-User Computing (VDI, physical endpoints, OS lifecycle) End-User Services and Service Desk (ITSM, automation, FCR, CSAT) Serve More ❯
and help us maintain our hosting platform Creating and improving routes to live with automation including blue/green & canary strategies Configure and improve observability controls Proving scalability/resilience and security controls What you'll need To display broad and deep technical experience with a passion for engineering excellence … experience configuring & running production workloads in Kubernetes CI/CD & IaC tools like Jenkins, Terraform, Sonar, Nexus, Git, Spinnaker, Harness Strong understanding & experience of Observability, SRE, DevSecOps & FinOps Good understanding of cloud networking & connectivity patterns Good understanding of key data tooling such as Kafka, BigTable, DataProc, BigQuery etc It would More ❯
Serverless, and S3 for cloud-native data solutions • Collaborate with frontend (Vue.js) and data platform engineers (Snowflake, Airflow) • Contribute to CI/CD pipelines, observability, and day-2 operational tooling • Partner with product and architecture teams to translate customer needs into working systems • Engage in cross-regional collaboration (US, EMEA … S3 • Experience with event-driven design using SQS, SNS, EventBridge • Comfortable working in containerized and serverless contexts (12-factor apps) • Hands-on experience with observability stacks: metrics, traces, logs • Strong communicator able to interface confidently with both technical and non-technical audiences Bonus Experience • Familiarity with IaC frameworks (CloudFormation, Terraform More ❯
Northern Ireland, United Kingdom Hybrid / WFH Options
Ocho
Serverless, and S3 for cloud-native data solutions • Collaborate with frontend (Vue.js) and data platform engineers (Snowflake, Airflow) • Contribute to CI/CD pipelines, observability, and day-2 operational tooling • Partner with product and architecture teams to translate customer needs into working systems • Engage in cross-regional collaboration (US, EMEA … S3 • Experience with event-driven design using SQS, SNS, EventBridge • Comfortable working in containerized and serverless contexts (12-factor apps) • Hands-on experience with observability stacks: metrics, traces, logs • Strong communicator able to interface confidently with both technical and non-technical audiences Bonus Experience • Familiarity with IaC frameworks (CloudFormation, Terraform More ❯