Cambridge, Cambridgeshire, United Kingdom Hybrid / WFH Options
Arm Limited
automation scripts (Python, Bash, Shell) and tools (GitLab, Terraform, Vault, Ansible) to streamline deployment, monitoring, and management processes using Infrastructure as Code (IaC). Implement and integrate monitoring and observability solutions, like AIOps, for proactive system issue detection and response. Participate in on-call rotations to ensure 24/7 system availability. Maintain detailed documentation of infrastructure, processes, and procedures More ❯
Cambridge, Cambridgeshire, United Kingdom Hybrid / WFH Options
Arm Limited
and Maintain Backstage - Design, build, and maintain custom and community-backed Backstage plugins to support Arm's engineering teams. Including CI/CD pipelines, service scaffolding, documentation, testing, and observability integrations. Collaborate Across Engineering & IT - Partner closely with platform, software and hardware teams to integrate services, tooling, and policies into the portal in a user-centric and automated manner. We More ❯
Cambourne, Cambridgeshire, United Kingdom Hybrid / WFH Options
Remotestar
strong track record of building and maintaining highly reliable infrastructure and services. Expertise in incident management, including incident response, resolution, and post-mortem analysis. Proficiency in monitoring, alerting, and observability tools such as Prometheus, Grafana, ELK stack or Datadog. Experience with cloud platforms such as AWS, Azure, or GCP, including infrastructure as code tools like Terraform or CloudFormation. Strong scripting More ❯
Cambridge, Cambridgeshire, United Kingdom Hybrid / WFH Options
Arm Limited
and Maintain Backstage - Design, build, and maintain custom and community-backed Backstage plugins to support Arm's engineering teams. Including CI/CD pipelines, service scaffolding, documentation, testing, and observability integrations. Collaborate Across Engineering & IT - Partner closely with platform, software and hardware teams to integrate services, tooling, and policies into the portal in a user-centric and automated manner. We More ❯
Cambourne, Cambridgeshire, United Kingdom Hybrid / WFH Options
Remotestar
end tests. Ability to write and understand design documentation using C4, sequence diagrams and workflows. Excellent problem-solving skills and attention to detail. Solid understanding of logging, monitoring and observability to understand if software is functioning as required. Strong communication and teamwork skills. Preferred Skills: Experience with cloud platforms e.g., AWS, Azure, Google Cloud. Knowledge of DevOps practices and CI More ❯
Bar Hill, Cambridgeshire, United Kingdom Hybrid / WFH Options
Domino Group
work day to day. That could include CI/CD pipelines including builds and testing - particularly automated testing - as well as issue tracking, source code management, binary artifact management, observability, and business continuity measures. It's an agile environment here at Domino: we've adopted Kanban, and most other teams use Scrum. For context, some of the technology we're More ❯
Cambridge, Cambridgeshire, United Kingdom Hybrid / WFH Options
Gearset Limited
changes quickly and safely. We live and breathe this approach ourselves: we release new versions of Gearset multiple times a day and we continually invest in improving our own observability and infrastructure tools. This means we can identify and react to issues quickly and delight our users by getting improvements to them as fast as possible. As a product-driven More ❯
Cambridge, Cambridgeshire, United Kingdom Hybrid / WFH Options
Arm Limited
and attitude on automating common repetitive tasks A suitable sense of ownership and responsibility in driving tasks to timely full completion "Nice To Have" Skills and Experience: AIOps and Observability Meaningful experience in a distributed team Working in a sophisticated, multi-geography, engineering services environment! Providing technical support and mentoring to othe Accommodations at Arm At Arm, we want to More ❯
Hemel Hempstead, Hertfordshire, United Kingdom Hybrid / WFH Options
Eckoh
DynamoDB, SQS, and EventBridge Develop robust CI/CD pipelines for applications running in EKS and serverless environments Embrace microservices and event-driven architecture patterns Implement logging, tracing, and observability practices from day one Contribute to the design and development of cloud-native data platforms that support real-time and batch processing AI & LLM Enablement: Collaborate with data scientists and More ❯
Hemel Hempstead, Hertfordshire, South East, United Kingdom Hybrid / WFH Options
Eckoh PLC
DynamoDB, SQS, and EventBridge Develop robust CI/CD pipelines for applications running in EKS and serverless environments Embrace microservices and event-driven architecture patterns Implement logging, tracing, and observability practices from day one Contribute to the design and development of cloud-native data platforms that support real-time and batch processing AI & LLM Enablement: Collaborate with data scientists and More ❯
Cambridge, Cambridgeshire, United Kingdom Hybrid / WFH Options
AVEVA Denmark
pipelines. Automate the evaluation of AI system outputs to ensure accuracy, consistency, and safety of responses. Collaborate with developers and data scientists to establish service-level quality metrics and observability hooks. Validate services against AI regulatory frameworks and ensure traceability, fairness, and robustness in outcomes. Participate in threat modelling and security validation of exposed APIs and AI services. Provide feedback More ❯
Cambridge, Cambridgeshire, United Kingdom Hybrid / WFH Options
Gearset Limited
over process and deliberation Great to haves Experience with .NET/C# Experience working in an agile development team with a focus on delivering value early Experience with building observability and alerting into systems Salary and benefits (the stuff you'd expect!) Salary is £78K - £100K (depending on experience) This is a full time opportunity, working Monday to Friday with More ❯
Hemel Hempstead, Hertfordshire, United Kingdom Hybrid / WFH Options
Eckoh
a secure, highly available, PCI-compliant AWS platform that underpins Eckoh's mission-critical services. As a senior member of the team, you will drive improvements in platform reliability, observability, and operational excellence. You will collaborate closely with development teams to enable secure, automated delivery of services while championing DevSecOps principles. This role offers the chance to shape the future … secure PCI-compliant cloud platform on AWS to support enterprise-grade applications and services. Architect and operate production workloads with a focus on high availability, scalability, and resilience. Drive observability and monitoring improvements across infrastructure and services to proactively identify issues. Promote and embed a security-first, DevSecOps culture, ensuring best practices are followed at every stage of the software … Strong knowledge of CI/CD pipelines and automation tooling (Gitlab experience preferable). Experience with "infrastructure as code" (Terraform, CloudFormation), containerisation (Docker), and orchestration (Kubernetes). Proficiency with observability and monitoring solutions (e.g., CloudWatch, Prometheus, Grafana, Splunk). Strong understanding of cloud-native development practices and agile ways of working. Confident conducting peer code reviews and providing constructive technical More ❯
Hemel Hempstead, Hertfordshire, South East, United Kingdom Hybrid / WFH Options
Eckoh PLC
a secure, highly available, PCI-compliant AWS platform that underpins Eckoh's mission-critical services. As a senior member of the team, you will drive improvements in platform reliability, observability, and operational excellence. You will collaborate closely with development teams to enable secure, automated delivery of services while championing DevSecOps principles. This role offers the chance to shape the future … secure PCI-compliant cloud platform on AWS to support enterprise-grade applications and services. Architect and operate production workloads with a focus on high availability, scalability, and resilience. Drive observability and monitoring improvements across infrastructure and services to proactively identify issues. Promote and embed a security-first, DevSecOps culture, ensuring best practices are followed at every stage of the software … Strong knowledge of CI/CD pipelines and automation tooling (Gitlab experience preferable). Experience with 'infrastructure as code' (Terraform, CloudFormation), containerisation (Docker), and orchestration (Kubernetes). Proficiency with observability and monitoring solutions (e.g., CloudWatch, Prometheus, Grafana, Splunk). Strong understanding of cloud-native development practices and agile ways of working. Confident conducting peer code reviews and providing constructive technical More ❯
Cambridge, Cambridgeshire, United Kingdom Hybrid / WFH Options
Arm Limited
collaborate across teams to: Modernise our infrastructure by leading the migration from Docker Swarm to Kubernetes Design and operate CI/CD pipelines using CloudBees and GitLab Build out observability with Prometheus, Grafana, OpenTelemetry, and Dynatrace Automate cloud deployments (AWS-first) using Terraform and platform tooling Improve security posture across IAM, secrets, and networking Help the team ship faster and … TypeScript, Python). Validated experience operating distributed systems at scale in production. Cloud AWS (primary), Kubernetes (future), Docker (current), Terraform. Excellent debugging skills across network, systems, and data stack. Observability tooling, e.g. custom metrics pipelines, OpenTelemetry tracing, or integrations across telemetry stacks. Security engineering and practical understanding of IAM hardening, zero-trust network principles, and secrets management in data-heavy More ❯
Cambridge, Cambridgeshire, East Anglia, United Kingdom Hybrid / WFH Options
La Fosse
infrastructure platform with AI-operable capabilities Oversee key infrastructure components such as data centre expansion, programmable compute, and software-defined network/storage Enable automation-first delivery models with observability, self-healing, and policy-driven control Implement and mature GitOps workflows, IaC pipelines, and CI/CD processes across engineering teams Lead programme governance, risk management, and stakeholder engagement Partner More ❯
Braintree, Essex, United Kingdom Hybrid / WFH Options
Supercash
dispute representment solutions and fraud tooling. Provide ongoing support to the Risk Operations and Dispute Operations teams for risk rule tuning and monitoring. Enable Subscription Success: Design alerting/observability for internal retry logic and applicable vendors, focusing on key metrics like authorization and recovery rates. Act as First Responder (On-Call): Participate in the TAM on-call rotation, responding … while documenting post-mortems and recommendations. Build for Reliability & Continuity: Document fallback scenarios and potential vendor replacements. Track SLAs, vendor performance, and incidents to support business continuity planning. Champion Observability and AI Adoption: Deploy AI-based anomaly detection and observability tooling. Leverage AI tools like Cursor to interrogate code repositories and surface root causes faster. Collaborate Across Teams and Vendors … partners to escalate, triage, and resolve complex technical issues. Who we're looking for: Proficiency with SQL, Python, and/or other analytics tools to support data-driven troubleshooting, observability, and reporting. Hands-on experience working with or supporting payments orchestration across multiple processors (e.g., Braintree, Adyen, ). Familiarity with AI tooling for debugging or observability (e.g., Cursor) or experience More ❯