Cambridge, Cambridgeshire, United Kingdom Hybrid / WFH Options
Arm Limited
automation scripts (Python, Bash, Shell) and tools (GitLab, Terraform, Vault, Ansible) to streamline deployment, monitoring, and management processes using Infrastructure as Code (IaC). Implement and integrate monitoring and observability solutions, like AIOps, for proactive system issue detection and response. Participate in on-call rotations to ensure 24/7 system availability. Maintain detailed documentation of infrastructure, processes, and procedures More ❯
Cambridge, Cambridgeshire, United Kingdom Hybrid / WFH Options
Arm Limited
and Maintain Backstage - Design, build, and maintain custom and community-backed Backstage plugins to support Arm's engineering teams. Including CI/CD pipelines, service scaffolding, documentation, testing, and observability integrations. Collaborate Across Engineering & IT - Partner closely with platform, software and hardware teams to integrate services, tooling, and policies into the portal in a user-centric and automated manner. We More ❯
Cambourne, Cambridgeshire, United Kingdom Hybrid / WFH Options
Remotestar
strong track record of building and maintaining highly reliable infrastructure and services. Expertise in incident management, including incident response, resolution, and post-mortem analysis. Proficiency in monitoring, alerting, and observability tools such as Prometheus, Grafana, ELK stack or Datadog. Experience with cloud platforms such as AWS, Azure, or GCP, including infrastructure as code tools like Terraform or CloudFormation. Strong scripting More ❯
preferred Mastery of Git and version control workflows in a collaborative team Comfortable in Agile/Scrum environments using tools like Jira and Confluence Experience supporting production systems, including observability and monitoring Desirable: Experience in regulated industries (e.g., financial services, healthcare) Working knowledge of MongoDB or similar document-oriented databases Familiarity with Golang or Python to support infrastructure tooling Microsoft More ❯
Cambridge, Cambridgeshire, United Kingdom Hybrid / WFH Options
Arm Limited
and Maintain Backstage - Design, build, and maintain custom and community-backed Backstage plugins to support Arm's engineering teams. Including CI/CD pipelines, service scaffolding, documentation, testing, and observability integrations. Collaborate Across Engineering & IT - Partner closely with platform, software and hardware teams to integrate services, tooling, and policies into the portal in a user-centric and automated manner. We More ❯
Cambourne, Cambridgeshire, United Kingdom Hybrid / WFH Options
Remotestar
end tests. Ability to write and understand design documentation using C4, sequence diagrams and workflows. Excellent problem-solving skills and attention to detail. Solid understanding of logging, monitoring and observability to understand if software is functioning as required. Strong communication and teamwork skills. Preferred Skills: Experience with cloud platforms e.g., AWS, Azure, Google Cloud. Knowledge of DevOps practices and CI More ❯
Bar Hill, Cambridgeshire, United Kingdom Hybrid / WFH Options
Domino Group
work day to day. That could include CI/CD pipelines including builds and testing - particularly automated testing - as well as issue tracking, source code management, binary artifact management, observability, and business continuity measures. It's an agile environment here at Domino: we've adopted Kanban, and most other teams use Scrum. For context, some of the technology we're More ❯
Cambridge, Cambridgeshire, United Kingdom Hybrid / WFH Options
Gearset Limited
changes quickly and safely. We live and breathe this approach ourselves: we release new versions of Gearset multiple times a day and we continually invest in improving our own observability and infrastructure tools. This means we can identify and react to issues quickly and delight our users by getting improvements to them as fast as possible. As a product-driven More ❯
Cambridge, Cambridgeshire, United Kingdom Hybrid / WFH Options
Arm Limited
and attitude on automating common repetitive tasks A suitable sense of ownership and responsibility in driving tasks to timely full completion "Nice To Have" Skills and Experience: AIOps and Observability Meaningful experience in a distributed team Working in a sophisticated, multi-geography, engineering services environment! Providing technical support and mentoring to othe Accommodations at Arm At Arm, we want to More ❯
Hemel Hempstead, Hertfordshire, United Kingdom Hybrid / WFH Options
Eckoh
DynamoDB, SQS, and EventBridge Develop robust CI/CD pipelines for applications running in EKS and serverless environments Embrace microservices and event-driven architecture patterns Implement logging, tracing, and observability practices from day one Contribute to the design and development of cloud-native data platforms that support real-time and batch processing AI & LLM Enablement: Collaborate with data scientists and More ❯
Hemel Hempstead, Hertfordshire, South East, United Kingdom Hybrid / WFH Options
Eckoh PLC
DynamoDB, SQS, and EventBridge Develop robust CI/CD pipelines for applications running in EKS and serverless environments Embrace microservices and event-driven architecture patterns Implement logging, tracing, and observability practices from day one Contribute to the design and development of cloud-native data platforms that support real-time and batch processing AI & LLM Enablement: Collaborate with data scientists and More ❯
Cambridge, Cambridgeshire, United Kingdom Hybrid / WFH Options
AVEVA Denmark
pipelines. Automate the evaluation of AI system outputs to ensure accuracy, consistency, and safety of responses. Collaborate with developers and data scientists to establish service-level quality metrics and observability hooks. Validate services against AI regulatory frameworks and ensure traceability, fairness, and robustness in outcomes. Participate in threat modelling and security validation of exposed APIs and AI services. Provide feedback More ❯
Cambridge, Cambridgeshire, United Kingdom Hybrid / WFH Options
Gearset Limited
over process and deliberation Great to haves Experience with .NET/C# Experience working in an agile development team with a focus on delivering value early Experience with building observability and alerting into systems Salary and benefits (the stuff you'd expect!) Salary is £78K - £100K (depending on experience) This is a full time opportunity, working Monday to Friday with More ❯
Hemel Hempstead, Hertfordshire, United Kingdom Hybrid / WFH Options
Eckoh
a secure, highly available, PCI-compliant AWS platform that underpins Eckoh's mission-critical services. As a senior member of the team, you will drive improvements in platform reliability, observability, and operational excellence. You will collaborate closely with development teams to enable secure, automated delivery of services while championing DevSecOps principles. This role offers the chance to shape the future … secure PCI-compliant cloud platform on AWS to support enterprise-grade applications and services. Architect and operate production workloads with a focus on high availability, scalability, and resilience. Drive observability and monitoring improvements across infrastructure and services to proactively identify issues. Promote and embed a security-first, DevSecOps culture, ensuring best practices are followed at every stage of the software … Strong knowledge of CI/CD pipelines and automation tooling (Gitlab experience preferable). Experience with "infrastructure as code" (Terraform, CloudFormation), containerisation (Docker), and orchestration (Kubernetes). Proficiency with observability and monitoring solutions (e.g., CloudWatch, Prometheus, Grafana, Splunk). Strong understanding of cloud-native development practices and agile ways of working. Confident conducting peer code reviews and providing constructive technical More ❯
Hemel Hempstead, Hertfordshire, South East, United Kingdom Hybrid / WFH Options
Eckoh PLC
a secure, highly available, PCI-compliant AWS platform that underpins Eckoh's mission-critical services. As a senior member of the team, you will drive improvements in platform reliability, observability, and operational excellence. You will collaborate closely with development teams to enable secure, automated delivery of services while championing DevSecOps principles. This role offers the chance to shape the future … secure PCI-compliant cloud platform on AWS to support enterprise-grade applications and services. Architect and operate production workloads with a focus on high availability, scalability, and resilience. Drive observability and monitoring improvements across infrastructure and services to proactively identify issues. Promote and embed a security-first, DevSecOps culture, ensuring best practices are followed at every stage of the software … Strong knowledge of CI/CD pipelines and automation tooling (Gitlab experience preferable). Experience with 'infrastructure as code' (Terraform, CloudFormation), containerisation (Docker), and orchestration (Kubernetes). Proficiency with observability and monitoring solutions (e.g., CloudWatch, Prometheus, Grafana, Splunk). Strong understanding of cloud-native development practices and agile ways of working. Confident conducting peer code reviews and providing constructive technical More ❯
Cambridge, Cambridgeshire, United Kingdom Hybrid / WFH Options
Arm Limited
collaborate across teams to: Modernise our infrastructure by leading the migration from Docker Swarm to Kubernetes Design and operate CI/CD pipelines using CloudBees and GitLab Build out observability with Prometheus, Grafana, OpenTelemetry, and Dynatrace Automate cloud deployments (AWS-first) using Terraform and platform tooling Improve security posture across IAM, secrets, and networking Help the team ship faster and … TypeScript, Python). Validated experience operating distributed systems at scale in production. Cloud AWS (primary), Kubernetes (future), Docker (current), Terraform. Excellent debugging skills across network, systems, and data stack. Observability tooling, e.g. custom metrics pipelines, OpenTelemetry tracing, or integrations across telemetry stacks. Security engineering and practical understanding of IAM hardening, zero-trust network principles, and secrets management in data-heavy More ❯
Cambridge, Cambridgeshire, East Anglia, United Kingdom Hybrid / WFH Options
La Fosse
infrastructure platform with AI-operable capabilities Oversee key infrastructure components such as data centre expansion, programmable compute, and software-defined network/storage Enable automation-first delivery models with observability, self-healing, and policy-driven control Implement and mature GitOps workflows, IaC pipelines, and CI/CD processes across engineering teams Lead programme governance, risk management, and stakeholder engagement Partner More ❯
function integrated throughout the software development lifecycle. Partnering closely with product and engineering teams, you will help scope and estimate strategic work, align on tooling, and drive improvements in observability, automation, and testing. Ideal Experience & Skills Demonstrated technical leadership across diverse skillsets, including Site Reliability Engineering (SRE), DevOps, and Quality Assurance (QA) Proven track record of aligning and integrating cross More ❯
new AI/ML methods Deployment and serving of models at scale Infrastructure automation and cloud-native design Responsible AI, LLM safety, and interpretability tooling Data pipelines, versioning, and observability in production A glimpse of roles we recruit for: AI Research Scientist Machine Learning Engineer Data Engineer with ML experience Applied Scientist/Research Engineer DevOps for AI/AI More ❯
building, and maintaining secure, high-performance network platforms. Work closely with internal teams and external partners to deliver integrated, end-to-end solutions. Support and enhance existing monitoring and observability frameworks using tools like SNMP and syslog. Deliver against technical roadmaps, ensuring platforms remain aligned with product support lifecycles. Stay up to date with new technologies, expanding your knowledge to More ❯
Hemel Hempstead, Hertfordshire, Felden, United Kingdom
Meritus
building, and maintaining secure, high-performance network platforms. Work closely with internal teams and external partners to deliver integrated, end-to-end solutions. Support and enhance existing monitoring and observability frameworks using tools like SNMP and syslog. Deliver against technical roadmaps, ensuring platforms remain aligned with product support lifecycles. Stay up to date with new technologies, expanding your knowledge to More ❯
Hemel Hempstead, Hertfordshire, South East, United Kingdom
Yolk Recruitment
building, and maintaining secure, high-performance network platforms. Work closely with internal teams and external partners to deliver integrated, end-to-end solutions. Support and enhance existing monitoring and observability frameworks using tools like SNMP and syslog. Deliver against technical roadmaps, ensuring platforms remain aligned with product support lifecycles. Stay up to date with new technologies, expanding your knowledge to More ❯
management for Windows workloads Create tooling and automation around the deployment of a customer-specific Windows-based SaaS product Ensure high availability, reliability, and scalability of Windows services. Integrate observability tooling (metrics, logs, traces) into IIS-hosted services Harden Windows infrastructure for security, compliance, and operational best practices Lead incident response for Windows-related systems Contribute to internal documentation and … Windows internals Proven ability to build infrastructure-as-code and CI/CD for Windows environments Comfort wrapping a Windows software product with the surrounding infrastructure, services, automation, and observability required to run it as a SaaS offering. Hands-on experience administering cloud infrastructure or building cloud-native applications (preferably on AWS) Comfortable using AWS EC2 Proficiency with command-line More ❯
portals, dashboards, internal tools, and web applications. Collaborate closely with DevOps on CI/CD pipelines, deployment workflows, infrastructure, and SecOps compliance. Uphold high standards for code quality, system observability, and technical documentation. Act as the technical lead, setting direction and best practices for the full-stack engineering team. Mentor engineers, providing guidance on architecture, design patterns, and career growth. … cross-functional teams Deep experience with React, TypeScript, .NET Core, SOAP/REST APIs, and MySQL/PostgreSQL, Red Hat OpenShift, Kubernetes Understanding of DevOps, cloud deployments, and service observability Bonus: Interest/experience in AI, digital twins, Nvidia Omniverse SDK & APIs, Universal Scene Description What We Offer : Reimbursement for tuition and professional dues Three weeks of vacation and five More ❯
Solutions consultant are looking for a Dynatrace Consultant to join the team for a 6 month contract. They are looking for a hands-on Dynatrace expert to join our observability team who are rolling out the platform across the company. We are building a center of excellence to support adoption across hundreds of teams, so you will work closely with … effectively. Guide teams in building dashboards, alerts, and service flow mappings aligned to engineering needs. Help teams craft complex DQL queries to extract meaningful insights from telemetry data. Support observability design and migration efforts from Prometheus, Grafana, and CloudWatch to Dynatrace. Advise on RBAC models and data access strategies based on team structure and security requirements. Assist in monitoring strategy … for Kubernetes-based workloads, especially in hybrid environments. Promote adoption of observability-as-code using tools like Terraform, GitLab, or other IaC frameworks. Contribute to reusable patterns, documentation, and internal enablement materials for engineering teams. Contract - 6 months Loocation - Hybrid - Cambridge More ❯