City, Cardiff, United Kingdom Hybrid / WFH Options
SRT Marine Systems PLC
of the approach much be proportionate to the complexity of the problem. Configure monitoring, logging, and alerting using CloudWatch, CloudTrail, and related services. Alternatively, use our existing monitoring and observability tooling. Work with stakeholders to understand non-prod use cases and tailor infrastructure accordingly. Ensure cost optimization and proper tagging of cloud resources. Implement basic IAM policies, roles, and access More ❯
Cardiff, South Glamorgan, United Kingdom Hybrid / WFH Options
SRT Marine Systems PLC
of the approach much be proportionate to the complexity of the problem. Configure monitoring, logging, and alerting using CloudWatch, CloudTrail, and related services. Alternatively, use our existing monitoring and observability tooling. Work with stakeholders to understand non-prod use cases and tailor infrastructure accordingly. Ensure cost optimization and proper tagging of cloud resources. Implement basic IAM policies, roles, and access More ❯
to scale our modular insurance platform. Build and support empowered, high-performing engineering teams across multiple domains. Scale and evolve a cloud-native, distributed architecture with best-in-class observability and automation. Embed agentic and independent-agent paradigms into our platform and product capabilities. Champion engineering excellence through continuous delivery, clean code, and robust governance. Collaborate cross-functionally with Product More ❯
environment and actively participating in regular agile ceremonies. Knowledge and experience with Google Kubernetes Engine (GKE) and related GCP services, or similar cloud container orchestration technologies. Desirable Proficient in observability tools, covering logging, tracing, event monitoring, and health monitoring. Solid knowledge and experience with Google Kubernetes Engine (GKE) and related GCP services. Demonstrable experience in delivering applications within the financial More ❯
of high-impact integration solutions across services and platforms. Collaborate on reusable API assets such as SDKs, templates, shared schemas, and common middleware. Implement robust error handling, logging, and observability across services and endpoints. Promote automation of API tests, documentation, contract validation, and pipeline integration. Collaboration & Engineering Maturity Act as a subject matter expert for APIs across squads and tribes More ❯
Remote ?? Up to £70,000 + annual share scheme + excellent benefits What You'll Do: You'll take a lead role in driving operational excellence, ensuring the resilience, observability, and performance of web-based systems across a growing digital platform. Working within a collaborative, cross-functional environment, you'll design scalable infrastructure, automate operations, and embed SRE principles to … web applications and distributed systems, including Micro Frontends and BFFs Hands-on expertise in React and TypeScript development with an eye for performance and resilience Proven ability to implement observability practices using tools like Prometheus, Grafana, or Azure Monitor Proficiency in containerisation and orchestration (Docker, Kubernetes - ideally AKS or GKE) Experience building and maintaining CI/CD pipelines for frontend More ❯
a critical role in ensuring the reliability, performance, and scalability of our software systems and infrastructure. You'll leverage your engineering expertise to design and deliver resilient platforms, improve observability, automate operations, and guide squads in applying SRE principles effectively. Working closely with Principal Engineers, Squad Leads, and cross-functional teams, you'll embed a culture of continuous improvement, proactive … and scalability. Implementation & Delivery Design and implement scalable, secure, and highly available systems on cloud platforms (Azure/GCP). Build and maintain monitoring and alerting solutions to ensure observability and proactive incident response. Contribute to CI/CD pipeline design, infrastructure as code, and deployment automation. Lead incident management activities, including post-incident reviews and improvement plans. Develop automation … and operating scalable frontend systems, including Micro Frontends (MFEs) and Backend-for-Frontend (BFF) architectures. Strong working knowledge of TypeScript and modern React development practices. Experience implementing and supporting observability and monitoring across distributed frontend/backend applications (e.g. using Azure Monitor, Prometheus, Grafana). Hands-on experience with containerisation (Docker) and orchestration using Kubernetes (AKS/GKE). Familiarity More ❯
Cardiff, South Glamorgan, United Kingdom Hybrid / WFH Options
Cognibox
unstructured data. - Build a modern Data Lakehouse/Warehouse as a single source of truth. - Implement reporting and dashboarding solutions that empower internal teams and customers. - Ensure platform reliability, observability, and security. - Promote reusable, curated datasets and self-service analytics across teams. - Collaborate with BI, Engineering, and business stakeholders to enhance analytics delivery. - Leverage cloud platforms and modern pipeline tooling More ❯
help drive platform engineering maturity by delivering cloud-native infrastructure, scalable CI/CD tooling, and shared services that empower cross-functional teams. Your focus will be on enhancing observability, automating operational processes, and ensuring that systems are well-documented and supportable. Working with tools such as GCP, Kubernetes, Helm, Terraform, and Azure DevOps, you'll create efficient, reliable environments … Terraform for infrastructure-as-code delivery Experience building and maintaining CI/CD pipelines, preferably with Azure DevOps Solid grasp of Git version control and GitOps principles Familiarity with observability tooling such as Prometheus, Grafana, or GCP Operations Suite Scripting ability with tools like Bash or Python Understanding of shared service models, access control, and platform support processes Desirable: experience More ❯
insight, and proactive incident management. Key Responsibilities * Translate high-level monitoring non-functional requirements (NFRs) into actionable configurations across tools such as Splunk, Dynatrace, and AppDynamics. * Deliver full-stack observability solutions, including application-aware network performance monitoring (NPM), synthetics, log analytics, and infrastructure metrics. * Provide live support for monitoring technologies and assist with live service support, including key business events More ❯