running on Java 21. We're in the process of moving our backend services to Spring Boot. We've invested heavily in our DataDog integration to bring world class observability and monitoring to our systems. We've recently moved to Gitlab and are currently building out our next generation of automated deployment pipelines. We've incorporated some of the best More ❯
Experience with major public cloud platforms , including Google Cloud Platform (GCP) , AWS , and Azure Strong understanding of networking technologies , such as LAN, WAN, firewalls , and related infrastructure Proficient with observability and monitoring tools , e.g Grafana, SolarWinds, Prometheus, AWS CloudWatch, Splunk Familiarity with DevOps practices , including CI/CD pipelines , is beneficial If you would be interested in having a further More ❯
using GCP-native tools and technologies. * Develop capabilities which allow Platform Engineering teams to operate with a DevOps ethos. * Collaborate with development teams to optimize application performance, reliability, and observability on GCP. * Implement and enforce Service Level Objectives (SLOs) and Error Budgets to ensure a balance between reliability and feature development. * Develop and maintain a comprehensive monitoring and alerting platform More ❯
using GCP-native tools and technologies.* Develop capabilities which allow Platform Engineering teams to operate with a DevOps ethos.* Collaborate with development teams to optimize application performance, reliability, and observability on GCP.* Implement and enforce Service Level Objectives (SLOs) and Error Budgets to ensure a balance between reliability and feature development.* Develop and maintain a comprehensive monitoring and alerting platform More ❯
product teams to define and deliver integration solutions - Troubleshoot and resolve issues such as data inconsistencies, auth errors, and performance bottlenecks - Monitor integration performance and implement logging, alerting, and observability - Document architecture, workflows, and integration processes - Contribute to continuous improvement of integration tools and practices What You Bring: - Proven experience building backend services using Node.js, Python, Java, or similar - Strong More ❯
secure, scalable cloud data solutions, aligning with business and compliance needs. Key Responsibilities Design, build, and maintain cloud-native data pipelines (Azure/GCP) Implement scalable data management frameworks: observability, validation, lineage Translate business needs into effective technical prototypes and solutions Collaborate with stakeholders, data teams, and service partners Ensure data security, governance, and regulatory compliance Monitor and optimise cloud More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
INTEC SELECT LIMITED
with infrastructure teams to ensure system reliability and operational efficiency Integrate monitoring and logging solutions (e.g., Prometheus, Grafana, ELK) Define strategies for disaster recovery, scaling, and infrastructure resilience Improve observability by enhancing visibility into performance and error metrics Skills and Experience Required 10+ years of backend development experience, including 5+ years in an architectural or engineering leadership role Proven experience More ❯
with infrastructure teams to ensure system reliability and operational efficiency Integrate monitoring and logging solutions (e.g., Prometheus, Grafana, ELK) Define strategies for disaster recovery, scaling, and infrastructure resilience Improve observability by enhancing visibility into performance and error metrics Skills and Experience Required 10+ years of backend development experience, including 5+ years in an architectural or engineering leadership role Proven experience More ❯
HIPAA). Background in customer-centric or product-driven industries such as digital , eCommerce , or SaaS . Experience with infrastructure-as-code tools like Terraform and expertise in data observability and monitoring practices. More ❯
involved across a wide range of technology infrastructure, including: computational hardware and storage, operating systems and virtualization technology, real-time data streaming platforms, CI/CD infrastructure, load balancers, observability tools, container orchestration platforms, and layer 1 networking technologies among others. Required Qualifications: Bachelor's degree in Computer Science, Electrical Engineering, Network Engineering or a similar discipline Up to More ❯
identify workflow optimisation potential, design and develop automation tools, using AI-driven tools and custom model integrations and scripts. Write and maintain tests for pipeline reliability. Build and maintain observability tooling in collaboration with other engineers to track data pipeline health and system performance. Collaborate with data scientists, operators, and product teams to deliver data solutions. Debug and resolve complex More ❯
Basingstoke, Hampshire, United Kingdom Hybrid / WFH Options
Once For All Limited
shared mono repos and deploying containerised FastAPI micro services on Kubernetes (AKS preferred). - Strong general grounding in software engineering best practices : CI/CD, automated testing, code reviews, observability and performance profiling. - ️ Experience using multi modal AI (document layout, vision language models, building agents). - Experience building quick prototypes (e.g. Streamlit, Dash) to validate ideas rapidly. - MLOps tooling (e.g. More ❯
Familiarity with deploying and scaling ML models in the cloud, particularly with AWS and SageMaker Understanding of DevOps processes and tools: CI/CD, Docker, Terraform, and monitoring/observability Bonus: experience with vector databases, semantic search, or event-driven systems like Kafka Additional Information Were a community here that cares as much about your life outside work as how More ❯
Guildford, Surrey, United Kingdom Hybrid / WFH Options
Electronic Arts
Source control management tools (e.g. Perforce, Git) Configuration management tools (e.g. Chef, Ansible, Terraform, Packer) Secrets management tools (e.g Vault) Virtualization environments and tools (e.g. VMs, vSphere) Data and Observability tools (e.g. Splunk, Grafana, New Relic, Open Telemetry) Growth-oriented mindset About Electronic Arts We're proud to have an extensive portfolio of games and experiences, locations around the world More ❯
West London, London, England, United Kingdom Hybrid / WFH Options
Young's Employment Services Ltd
Fabric, leveraging expertise in Azure Data Factory, Databricks, and other Azure services. Advocate for engineering best practices and ensure long-term sustainability of systems. Integrate principles of data quality, observability, and governance throughout all processes. Participate in recruiting, mentoring, and developing a high-performing data organization. Demonstrate pragmatic leadership by aligning multiple product workstreams to achieve a unified, robust, and More ❯
platform services that arecrucial for accelerating the productivity of all engineeringteams. As an Engineering Manager, you will lead and expand twocore components: the Feature Flags Service and the EngineeringInternal Observability Platform (E360). You'll ensure that thesesystems serve as the foundation for seamless and efficientengineering workflows, allowing us to deliver top-tier AI-drivenproducts. What You'll Do at More ❯
aligned to Public Cloud Lab/s and will work with the relevant Product Owners and Engineering Leads, using data, to balance product improvements covering aspects such as reliability, observability and performance, with new feature development Key Responsibilities You will help improve the SRE framework and principles to strengthen focus, behaviours, and culture You will support the POs and ELs More ❯
aligned to Public Cloud Lab/s and will work with the relevant Product Owners and Engineering Leads, using data, to balance product improvements covering aspects such as reliability, observability and performance, with new feature development Key Responsibilities You will help improve the SRE framework and principles to strengthen focus, behaviours, and culture You will support the POs and ELs More ❯
aligned to Public Cloud Lab/s and will work with the relevant Product Owners and Engineering Leads, using data, to balance product improvements covering aspects such as reliability, observability and performance, with new feature development Key Responsibilities You will help improve the SRE framework and principles to strengthen focus, behaviours, and culture You will support the POs and ELs More ❯
changes quickly and safely. We live and breathe this approach ourselves: we release new versions of Gearset multiple times a day and we continually invest in improving our own observability and infrastructure tools. This means we can identify and react to issues quickly and delight our users by getting improvements to them as fast as possible. As a product-driven More ❯
Cambridge, Cambridgeshire, United Kingdom Hybrid / WFH Options
Gearset Limited
changes quickly and safely. We live and breathe this approach ourselves: we release new versions of Gearset multiple times a day and we continually invest in improving our own observability and infrastructure tools. This means we can identify and react to issues quickly and delight our users by getting improvements to them as fast as possible. As a product-driven More ❯
changes quickly and safely. We live and breathe this approach ourselves: we release new versions of Gearset multiple times a day and we continually invest in improving our own observability and infrastructure tools. This means we can identify and react to issues quickly and delight our users by getting improvements to them as fast as possible. As a product-driven More ❯
clusters across a range of cloud platforms (AWS, Azure) and on-premise infrastructure Experience with infrastructure-as-code tools, preferably Crossplane, Terraform, and Helm Understanding of networking, security, and observability in containerised environments Proven track record of working directly with external customers or clients Excellent communication skills with the ability to explain complex technical concepts to diverse audiences, including managing More ❯
software development and architecture. Experience influencing technical decisions across the different stakeholder levels of the business including non-technical audiences. Ability to foster a culture around data-driven reliability, observability, monitoring, and automation. Due to the global nature of the team, a degree of flexible working will be required to accommodate different time zones. We are an equal opportunities employer. More ❯