Code principles Design an agile release engineering strategy that delivers value incrementally and continuously Support a highly-available live production system, respond to alerts, diagnose problems using logs and observability tooling, triage and resolve incidents What we offer We make sure our team is well looked after with generous salaries and a great benefits package which includes: Enhanced pension with More ❯
scalability and reduce manual intervention. Operational Security, SRE & Assurance: Ensure security platforms are resilient, continuously monitored, and designed for 24x7 support and incident response readiness. Embed security telemetry and observability to enable proactive threat detection and automated response. Apply SRE principles to improve reliability, performance, and maintainability of security services. Define service level objectives (SLOs) and key performance indicators (KPIs More ❯
orchestration and infrastructure-as-code. * Solid understanding of cloud networking and architecture (AWS, Azure, or GCP). * Experience with CI/CD systems and automated deployment workflows. * Familiarity with observability and performance monitoring tools. * Experience with data pipelines and workflow orchestration. * Excellent communication and documentation skills. * Alignment with SRE principles and a passion for automation and reliability. * Security-first approach … cloud infrastructure to support scalable and secure application deployments. * Develop and maintain CI/CD pipelines to streamline development and release processes. * Monitor and optimize system performance using modern observability tools. * Support and enhance data processing workflows using event-driven orchestration. * Troubleshoot production issues and implement solutions to ensure system stability. * Document infrastructure and promote best practices across teams. * Embed … Workflows, Prometheus, Grafana, Sentry, Python, Java, Next.js, Infrastructure as Code, Monitoring, Logging, Security, SRE, Remote DevOps, UK Tech Jobs, STEM, ISO 27001, SOC2, HIPAA, GDPR, Git, Cloud Security, Automation, Observability, Event-driven Architecture More ❯
an initial 6 month contract. You'll be primarily responsible for working in a team that designs, builds, and maintains the organisations cloud infrastructure, with a focus on automation, observability and scalability. Essential skills/experience required: AWS Infrastructure as code using Terraform Cloudflare Developing CI/CD pipelines Incredibly beneficial: Snowflake MLOps Security best practices The role is confirmed More ❯
or strong interest in learning) cloud-native tooling: AWS (especially CloudWatch) Artifact Management (e.g., Artifactory, CodeArtifact) Infrastructure as Code with Terraform Monitor test metrics, troubleshoot failures, and improve system observability and debuggability. More ❯
Milton Keynes, Buckinghamshire, South East, United Kingdom
Interact Consulting Limited
or strong interest in learning) cloud-native tooling: AWS (especially CloudWatch) Artifact Management (e.g., Artifactory, CodeArtifact) Infrastructure as Code with Terraform Monitor test metrics, troubleshoot failures, and improve system observability and debuggability. More ❯
communication skills, able to engage both technical and non-technical stakeholders Leadership experience within data teams Desirable DAMA certified (CDMP) Knowledge of Lakehouse and other database architectures Familiarity with observability principles and BI tools (e.g. Power BI) Experience working in Agile environments More ❯
a focus on security, data protection, and performance optimization. Experience managing transport and change governance, incident triage, and root cause analysis. Skilled in monitoring tools like SAP Cloud ALM, observability platforms, and incident management platforms such as Jira or Azure DevOps. Adept at documentation using Confluence and following agile methodologies like Scrum and Kanban. Exceptional stakeholder management and communication skills More ❯
of student lifecycle processes in Higher Education and relevant data domains. Knowldge of event-driven and message-based architectures (Event Hub, Kafka, or Service Bus) Experience with monitoring and observability tools like Azure Monitor, Application Insights, and Log Analytics. Awareness of data security, GDPR, and compliance in educational or public sector environments. Exposure to OpenAPI/Swagger, API lifecycle management More ❯
production. Deploy, maintain, and optimise machine learning services within a cloud environment (AWS). Recommend and implement prompt management tools and provide expertise in prompt engineering. Introduce and manage observability, monitoring, and evaluation frameworks for ML and AI services. Enable auto-evaluation of prompts and models against domain-specific requirements. Build Python-based microservices, data pipelines, and serverless functions. Collaborate More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
Salt Search
production. Deploy, maintain, and optimise machine learning services within a cloud environment (AWS). Recommend and implement prompt management tools and provide expertise in prompt engineering. Introduce and manage observability, monitoring, and evaluation frameworks for ML and AI services. Enable auto-evaluation of prompts and models against domain-specific requirements. Build Python-based microservices, data pipelines, and serverless functions. Collaborate More ❯
achievable, ensuring delivery on time to an impeccable standard. Tech wise, it is a microservices environment running on Kubernetes hosted in Azure. Distributed systems and cloud native development. IAC, Observability and big bonus points if you have grasp of an object-oriented programming language. Continuous improvement is key across technology so if there's a better tool, it will be More ❯
Leeds, West Yorkshire, United Kingdom Hybrid / WFH Options
Tria
within enterprise systems. Strong understanding of cloud platforms (Azure preferred). Knowledge of Infrastructure-as-Code (IaC), APIs, and automation tools. Familiarity with CI/CD pipelines, monitoring, and observability tools. Knowledge of ITSM, Agile, DevOps, and service-level objectives (SLOs) and indicators (SLIs). Excellent problem-solving skills and ability to work in complex, multi-supplier environments. Desirable: Bachelor More ❯
Python Experience with IaC principles and automation tools such as Ansible, Puppet and SaltStack General HPC technical knowledge regarding compute, network, memory, and storage components Experience with monitoring and observability tools such as Grafana Clearance: TS/SCI clearance with polygraph is required. Total Compensation Package We offer a comprehensive compensation package designed to support your well-being and professional More ❯
Leeds, West Yorkshire, Yorkshire, United Kingdom Hybrid / WFH Options
StepChange Debt Charity
and governance controls Automation & Orchestration (Essential): Building Infrastructure as Code (IaC) using Terraform. Designing CI/CD pipelines for repeatable, automated deployments Driving operational excellence with monitoring, logging, and observability tools such as CloudWatch and AWS Config. Monitoring (Desirable) - Grafana Strong troubleshooting skills and diagnostic abilities for BAU escalations An aptitude for Security and a keen eye for detail. Ideally More ❯
APIs Experience of writing performance critical code Experience of using Git or similar to track changes Experience of both the full .NET Framework and .NET Core Experience of using observability systems such as Elastic APM or DataDog to track and diagnose issues in production A solid understanding of security principles and secure coding including OWASP Top 10 Nice to haves More ❯
APIs Experience of writing performance critical code Experience of using Git or similar to track changes Experience of both the full .NET Framework and .NET Core Experience of using observability systems such as Elastic APM or DataDog to track and diagnose issues in production A solid understanding of security principles and secure coding including OWASP Top 10 Nice to haves More ❯
An Engineer's Product Leader: Your technical credibility is non-negotiable. You have a deep, hands-on command of the modern cloud-native landscape (Kubernetes, AWS, CI/CD, Observability) and a background in software engineering or architecture. You don't just talk the talk; you can hold your own in complex architectural debates, gain the respect of top-tier More ❯
Edinburgh, Midlothian, United Kingdom Hybrid / WFH Options
Aberdeen
Implement automated deployment and testing of integration components using Azure DevOps or GitHub Actions. Contribute to Infrastructure as Code (IaC) practices using Bicep or Terraform. Set up and maintain observability for integration components using Azure Monitor, Application Insights, and Log Analytics. Support incident response and root cause analysis for integration-related issues. Apply security best practices across integration solutions, including More ❯
Make informed, pragmatic technical decisions with the autonomy to influence architectural direction and the technical roadmap for the team. Champion and apply engineering best practices (clean code, testing, maintainability, observability) and support your team in doing the same through mentorship and collaboration. Collaborate with product managers and designers to translate technical groundwork into product feature delivery, transitioning to a product More ❯
teams to align on data architecture and ensure our ML systems meet overarching business objectives. Evolve our MLOps infrastructure, driving the strategy for model versioning, automated deployments, monitoring, and observability using modern tools like Prefect. Mentor and guide other members of the team, fostering a culture of technical excellence and continuous improvement through code reviews, design discussions, and knowledge sharing. More ❯
unwarranted access to corporate data. Review outstanding issues daily to assure that troubleshooting and resolutions are current. Cross-functional collaboration with application engineering, QA, and infrastructure teams to ensure observability and reliability. Perform tool evaluation and selection in support of observability and automation Qualifications Education Level: Bachelor's Degree Preferred experience includes AWS or Azure certifications. 7+ years of total … or closely related roles. At least 3 years of direct experience with AWS and/or Azure, including infrastructure provisioning, automation, and monitoring. Experience with implementing, managing, and using observability tools, data visualization, and application monitoring platforms such as Dynatrace, AWS CloudWatch, Azure Monitor, Grafana, Prometheus, or Datadog. Familiarity with error budgets and their role in balancing reliability and innovation. More ❯
This is a job posted by our partner Jooble. Below is a snippet of the job description. To read the full text, please click on the "Apply Now" link. Job Description: The role involves enhancing the company's monitoring capabilities More ❯
including Salesforce-specific pipelines. Build and maintain Infrastructure as Code (IaC) using Terraform and Ansible. Design highly reliable, scalable, and secure infrastructure supporting performance-critical workloads. Build proactive monitoring, observability, and alerting with Prometheus, Grafana, Azure Monitor, DataDog, and Dynatrace. Troubleshoot complex system issues spanning applications, networks, and infrastructure. Define platform SLAs, SLOs, and governance standards for self-service use. … Infrastructure as Code with Terraform and Ansible, along with scripting in PowerShell, Python, or Bash Experience implementing GitOps workflows and managing platform SLAs, SLOs, and governance standards Familiarity with observability and monitoring tools including Prometheus, Grafana, Azure Monitor, DataDog, or Dynatrace Preferred experience supporting Salesforce DevOps pipelines and working with Java, .NET, or Node.js application environments Exposure to AI/ More ❯