Cambridge, Cambridgeshire, United Kingdom Hybrid / WFH Options
Arm Limited
and attitude on automating common repetitive tasks A suitable sense of ownership and responsibility in driving tasks to timely full completion "Nice To Have" Skills and Experience: AIOps and Observability Meaningful experience in a distributed team Working in a sophisticated, multi-geography, engineering services environment! Providing technical support and mentoring to othe Accommodations at Arm At Arm, we want to More ❯
roles. Experience with technical troubleshooting and scripting languages such as Python, Go, or Bash. Experience with Kubernetes security, including workload isolation, RBAC, and network policies, containerisation, orchestration, and Kubernetes observability tools (e.g., Falco, Prometheus, Grafana). Experience with infrastructure-as-code and configuration management tools (e.g., Terraform, Helm, ArgoCD). United Kingdom Security Vetting Developed Vetting (DV) clearance. Preferred qualifications More ❯
platform services that arecrucial for accelerating the productivity of all engineeringteams. As an Engineering Manager, you will lead and expand twocore components: the Feature Flags Service and the EngineeringInternal Observability Platform (E360). You'll ensure that thesesystems serve as the foundation for seamless and efficientengineering workflows, allowing us to deliver top-tier AI-drivenproducts. What You'll Do at More ❯
software development and architecture. Experience influencing technical decisions across the different stakeholder levels of the business including non-technical audiences. Ability to foster a culture around data-driven reliability, observability, monitoring, and automation. Due to the global nature of the team, a degree of flexible working will be required to accommodate different time zones. We are an equal opportunities employer. More ❯
automating our physical server inventory using Infrastructure as Code (IaC). You will work across all layers of infrastructure, including: Networking & Exchange Connectivity Linux Systems & Kubernetes Administration Microservice Orchestration & Observability Disaster Recovery & Security Optimization Your mission is to improve latency, scalability, and reliability, ensuring GSR remains a best-in-class market maker. We value engineers who drive automation, reduce friction More ❯
QUALIFICATIONS - 4+ years of experience in a technical support or support engineering role. - Experience working with AWS (e.g. EC2, EBS, S3, Route53)Experience working with SQL. - Experience working with observability platforms (e.g. Grafana, Kibana) for monitoring, troubleshooting, and diagnostic. - Experience working with monitoring and alerting systems (e.g. CloudWatch, Prometheus). Acknowledgement of country: In the spirit of reconciliation Amazon acknowledges More ❯
Experience using managed languages such as Python, Go, C#, Java, or similar. Experience utilizing CI/CD platforms to automate provisioning infrastructure, software builds, tests, and releases. Experience using observability tools such as APM, logging, and metrics to assist with debugging issues. Experience designing tooling to simplify the operational management of SaaS/PaaS systems. Familiarity with building flexible and More ❯
and CI/CD workflows (GitLab CI). Write clean, production-grade code in Python (Scala is a bonus). Build infrastructure using Terraform, AWS CloudFormation, or SAM. Drive observability across the platform using Datadog or CloudWatch. Actively mentor Data Engineers and Associates, and lead technical discussions and design sessions. Key requirements: Must-Have: Strong experience with AWS services: Glue More ❯
product planning, roadmap discussions, and strategic prioritization. Operational Excellence Own key engineering KPIs including system uptime, velocity, tech debt reduction, and deployment frequency. Drive cloud infrastructure cost-efficiency, system observability, and DevSecOps maturity. Lead incident management and escalation processes with customer sensitivity and transparency. Qualifications: 10+ years in software engineering, including 5+ years in engineering leadership roles. Proven experience building More ❯
Cardiff, South Glamorgan, United Kingdom Hybrid / WFH Options
RVU Co UK
Experience of building and designing cost optimised Cloud platforms (preferably Azure) from the ground up, following well architected principles Solid understanding of platform and reliability engineering approaches (SRE), including observability, performance optimisation, capturing analytics and security best practices Experience implementing Service Level Objectives and using them to drive error budgets, risk management and alerting Knowledge and experience with operating containers More ❯
Experience using managed languages such as Python, Go, C#, Java, or similar. Experience utilizing CI/CD platforms to automate provisioning infrastructure, software builds, tests, and releases. Experience using observability tools such as APM, logging, and metrics to assist with debugging issues. Experience designing tooling to simplify the operational management of SaaS/PaaS systems. Familiarity with building flexible and More ❯
our global customers to innovate with confidence. Operating as part of the broader Infrastructure organization, the Cloud Security team partners closely with key engineering groups including Networking, Compute, and Observability to embed security deeply across Miro's cloud environment. The team also maintains strong alignment with our peers in the Security organization-such as Application Security and Detection & Response-ensuring More ❯
protocols required in a classified environment. Maintenance and ongoing development of automation and continuous build/integration/deployment infrastructure for multiple environments; write, build and deploy scripts. Enable observability and resilience throughout multi-cloud landing zones and Dev/Test/Prod environments. Support implementation of security policies, standards, guidelines, and governance. Requirements: Bachelor's Degree and 4 or More ❯
Technical Expertise: 5+ years of professional experience in backend development (Go Lang). Deep knowledge of Go Lang, with hands-on experience building scalable services. Experience with working with observability stack (logging, metrics, tracing). Expertise in building RESTful APIs following company standards. Understanding of Domain-Driven Design and Modularization concepts. Asynchronous processing with approaches like co-routines, messages queuing More ❯
Malvern, Worcestershire, United Kingdom Hybrid / WFH Options
QinetiQ Limited
the evaluation of the performance of LLMs in different contexts. Accountabilities: Understands the technical aspects of the project and the wider customer business model. Solution architecture, including security, availability, observability, scalability, performance, reliability, and cost-efficiency. Ensures team members understand and adhere to project standards for quality, documentation, techniques and tools. Identifies, escalates & manages technical risk with Team Manager and More ❯
Technical Leadership & DevOps Culture Lead by example across delivery teams, offering hands-on technical support and ensuring engineering excellence. Promote a DevOps-first culture by championing continuous delivery, automation, observability, and operational readiness in everything we build. Help teams strike the right balance between shipping value quickly and building with long-term sustainability in mind. Work hand-in-hand with More ❯
Technical Leadership & DevOps Culture Lead by example across delivery teams, offering hands-on technical support and ensuring engineering excellence. Promote a DevOps-first culture by championing continuous delivery, automation, observability, and operational readiness in everything we build. Help teams strike the right balance between shipping value quickly and building with long-term sustainability in mind. Work hand-in-hand with More ❯
a live service for users Experience with understanding network architectures and troubleshooting network-related issues using Linux tools In-depth expertise in at least one of: Kubernetes, TerraForm, Networking, Observability Flexibility and mobility are required to deliver this role as there may be requirements to spend time onsite with our clients and partners to enable delivery of the first-class More ❯
a live service for users Experience with understanding network architectures and troubleshooting network-related issues using Linux tools In-depth expertise in at least one of: Kubernetes, TerraForm, Networking, Observability Flexibility and mobility are required to deliver this role as there may be requirements to spend time onsite with our clients and partners to enable delivery of the first-class More ❯
Experience of building and designing cost optimised Cloud platforms (preferably Azure) from the ground up, following well architected principles Solid understanding of platform and reliability engineering approaches (SRE), including observability, performance optimisation, capturing analytics and security best practices Experience implementing Service Level Objectives and using them to drive error budgets, risk management and alerting Knowledge and experience with operating containers More ❯
Experience of building and designing cost optimised Cloud platforms (preferably Azure) from the ground up, following well architected principles Solid understanding of platform and reliability engineering approaches (SRE), including observability, performance optimisation, capturing analytics and security best practices Experience implementing Service Level Objectives and using them to drive error budgets, risk management and alerting Knowledge and experience with operating containers More ❯
Cardiff, South Glamorgan, Wales, United Kingdom Hybrid / WFH Options
Confused.com
Experience of building and designing cost optimised Cloud platforms (preferably Azure) from the ground up, following well architected principles Solid understanding of platform and reliability engineering approaches (SRE), including observability, performance optimisation, capturing analytics and security best practices Experience implementing Service Level Objectives and using them to drive error budgets, risk management and alerting Knowledge and experience with operating containers More ❯
services/message buses and other architectural elements Deploy these applications using features such as containers to cloud leveraging CI/CD to support this process backed with good observability when running these in production Ensure quality through the creation of documentation and use of unit/integration/contract testing with a consideration of security/performance requirements We More ❯
distributed service architectures, including how best to test and release them, and how to ensure system stability when making changes independent of other services. You are able to use Observability tooling to understand, diagnose, improve, debug, measure and visualise platform health. You are up-to-date with the latest technologies including AI for example Machine Learning for personalisation or automation More ❯