Birmingham, West Midlands, United Kingdom Hybrid/Remote Options
Inspire People
Edinburgh or Belfast. About the Role As a Senior Site Reliability Engineer, you will: - Build and scale DBT's product platform and services in AWS. - Provide development teams with observability, monitoring, CI/CD pipelines and service-level objectives. - Participate in an on-call rota (with allowance), helping to keep DBT services resilient and reliable. - Mentor junior engineers and contribute More ❯
tools such as Airflow ● Experience with cloud platforms (AWS, GCP, or Azure) and infrastructure-as-code tools (e.g., Docker, Terraform, CloudFormation). ● Familiarity with data quality, data governance, and observability tools (e.g., Great Expectations, Monte Carlo).[3] ● Experience with BI and data visualization tools (e.g., Looker, Tableau, Power BI). ● Experience working with product analytics solution (Amplitude, Mixpanel) ● Experience More ❯
Warwick, England, United Kingdom Hybrid/Remote Options
Ocho
in Git, SQL optimisation, and async architecture. Excellent communicator who values clarity, documentation, and collaboration. Nice to Have Experience with Supabase , Kubernetes , Docker , Azure , GitHub Actions , vector databases , or observability tools like Prometheus , Grafana , and Langfuse . What Success Looks Like 3 months: You’ve established your 1:1 rhythm, shipped your first automation workflow, and built a trusted partnership More ❯
with React, Vue, or Blazor Integrate LLMs and GenAI features into core product experiences Lead technical decision-making and mentor engineers within your squad Ensure best practices across testing, observability, and code quality What We’re Looking For Proven experience delivering AI/ML-powered production systems (not prototypes) Strong full-stack capability – C# .NET + modern JavaScript frameworks Solid More ❯
Worcester, England, United Kingdom Hybrid/Remote Options
Chapman Tate Associates
Infrastructure as Code (Terraform, Bicep, PowerShell). Solid grasp of Azure security and identity management in line with Zero Trust principles. Experience with CI/CD pipelines , monitoring , and observability tools such as Azure Monitor and Log Analytics. Excellent communication skills, stakeholder engagement, and a proactive approach to problem-solving. Strategic mindset with the ability to balance hands-on delivery More ❯
experience with LLMs/GenAI/ML in production Strong background in C#, .NET, REST APIs , and cloud platforms (Azure, AWS, or GCP) Agile mindset with focus on testing, observability, and secure delivery Excellent communication and cross-functional collaboration skills Nice to have Experience with vector databases , RAG systems , or multi-agent AI Python skills for AI/ML development More ❯
West Midlands (County), Birmingham, United Kingdom Hybrid/Remote Options
Sherborne Talent Solutions
automation, and optimisation of CI/CD pipelines to drive speed, reliability, and consistency. Manage and optimise Azure infrastructure for scalability, security, performance, and cost control. Champion modern monitoring, observability, and incident management practices to maintain high availability. Partner with engineering, architecture, and product leadership to accelerate delivery and reduce operational friction. Drive adoption of FinOps principles to balance technical More ❯
ensure efficient delivery of software updates. Senior DevOps Engineer (in addition to above) Contribute to the architecture and evolution of our cloud infrastructure strategy. Drive best practices for automation, observability, and security within DevOps. Mentor and coach junior team members, supporting their technical growth. Evaluate new technologies and tools to improve operational efficiency. Champion continuous improvement across our delivery pipelines … GitHub Actions, or Jenkins). Experience with Infrastructure as Code (Terraform, Bicep, or ARM templates). Proficiency in scripting languages (PowerShell, Bash, or Python). Experience with monitoring and observability tools (e.g., Application Insights, Grafana, Prometheus). Understanding of containerisation and orchestration (Docker, Kubernetes). Familiarity with security best practices in cloud environments. Desirable Experience within SaaS or FinTech environments. More ❯
for leading and executing the migration of data, dashboards, alerts, and configurations from Splunk systems to Elasticsearch. This role involves deep technical expertise in Splunk architecture, data ingestion, and observability tools, along with strong project management and stakeholder communication skills. Must have skills: -Splunk -ELK Stack -Kibana Nice to have skills: -stakeholder communication skills -strong project management Responsibilities: Minimum number More ❯
Birmingham, England, United Kingdom Hybrid/Remote Options
EML
deployment processes with a focus on minimizing security risks. Site Reliability Engineering (SRE): Ensure system reliability, scalability, and performance through proactive monitoring and secure incident response. Develop and implement observability tools to monitor system health, detect anomalies, and identify security threats. Perform root cause analysis and implement solutions to prevent recurring issues, including security vulnerabilities. Define and measure Service Level More ❯
Herefordshire, West Midlands, United Kingdom Hybrid/Remote Options
itecopeople
complex environments. Key Responsibilities Partner with Software Engineers to enhance system reliability, scalability, and performance. Collaborate with System Administrators to automate repetitive tasks and streamline alerts. Advance monitoring and observability practices to identify and resolve issues before they affect users. Support development and testing environments to help meet delivery and quality objectives. Research, evaluate, and recommend tools and technologies to … as code. Expertise with containerisation and orchestration (Docker, Kubernetes, OpenShift, or Swarm). Skilled in CI/CD pipeline tools (e.g. Jenkins, GitLab CI). Proficient with monitoring and observability tools (Grafana, Prometheus, InfluxDB). Experience integrating event-driven systems using MQ solutions (RabbitMQ or similar). Strong knowledge of SQL and relational databases . Advanced Linux administration and shell … Desirable Skills Programming experience in Java, Go, or Python . Understanding of cross-domain technologies and security models. Background in service management environments and ITIL practices. Proven application of observability patterns and system health metrics. Experience with Microsoft Azure cloud services. For more information, send your CV to Ryan at Services Advertised are those of an Employment Business More ❯
Birmingham, West Midlands, United Kingdom Hybrid/Remote Options
ByteHire
or communicating with robotic automation systems and integrating with physical devices Desktop app development with Electron CI/CD setup, rollback strategies, and deployment automation Sentry, NewRelic, or other observability tooling implementation More ❯
Hereford, Herefordshire, West Midlands, United Kingdom Hybrid/Remote Options
Twinstream Limited
ensuring the availability, performance, and resilience of our secure, high-impact services. You'll work with development and support teams to evolve infrastructure, streamline delivery pipelines, and strengthen system observability — ensuring performance bottlenecks and reliability risks are resolved before they ever reach production. Expect a technically rich environment, diverse challenges, and the opportunity to make a measurable difference. Key Responsibilities … Reliability Engineer: Partner with Software Engineers to enhance reliability and performance across complex systems Collaborate with SysAdmins to automate toil and eliminate manual intervention Build smarter monitoring, logging, and observability pipelines to detect and resolve issues early Support and improve development environments to hit delivery and quality goals Research new tools, services, and architectures to drive scalability and resilience Expand … Ansible, Chef, etc.) Skilled with Docker and Kubernetes/OpenShift/Docker Swarm Hands-on experience building and maintaining CI/CD pipelines (e.g. Jenkins) Deep understanding of monitoring & observability tools (Grafana, Prometheus, InfluxDB) Solid grounding in Linux, network security, SQL, and AWS (EC2, S3, RDS, Lambda) Comfortable with MQ messaging (RabbitMQ or similar) Bonus points for: Experience with Azure More ❯
United Kingdom, Birmingham, West Midlands (County)
Uniting Ambition
with MLOps practices and AI development frameworks (e.g., Azure AI, LangChain, Hugging Face). Relevant certifications in Azure Architecture, Data, or AI disciplines. Knowledge of automation tools, monitoring, and observability platforms. If you have these skills and would like to find out more, please apply now. More ❯
Hereford, Herefordshire, West Midlands, United Kingdom Hybrid/Remote Options
Hays
focused on ensuring service availability, performance, and cost-efficiency across both cloud and on-prem infrastructure. You'll work closely with development and support teams to evolve infrastructure, enhance observability, and proactively mitigate reliability risks. Key Responsibilities: Collaborate with software engineers to improve reliability and performance Automate operational tasks and reduce alert fatigue Enhance monitoring and observability to pre-empt … platforms, ideally AWS (EC2, RDS, S3, Lambda) Desirable: Coding experience in Java, Go, Python or similar Knowledge of cross-domain technologies Experience in service management environments Practical application of observability patterns Experience with Azure Additional Information: Due to the nature of the work, successful candidates will be required to undergo security vetting. We welcome applications from all backgrounds and are More ❯
Birmingham, West Midlands, United Kingdom Hybrid/Remote Options
Robert Walters
to improve performance Develop strategies to improve performance across group technology DevOps Lead: Experience Technical dept across but not limited to: Java, UNIX, Linux, Middleware, Web-Logic, Cloud Platforms Observability tools Designing/Developing/Implementing technology advancements Experience of improving resilience of complex production environments The permanent opportunity for a DevOps Lead will pay a salary range of More ❯
next-generation AI products. You’ll join a small, experienced team developing an internal Kubernetes-based platform that enables AI innovation across the organisation automating everything from deployments to observability, and helping developers build smarter applications with confidence. What you’ll be doing: Designing, deploying, and maintaining Azure Kubernetes (AKS) environments Managing Infrastructure as Code with Terraform and improving GitOps … workflows (ArgoCD/GitHub Actions) Building observability and monitoring stacks using Prometheus, Grafana, and Loki Supporting AI workloads (LLMs, RAG, and document processing applications) running on Kubernetes Automating platform operations with Python, Go, and shell scripting Implementing security guardrails, PII compliance tooling, and best practices for production AI systems What you’ll need: 3+ years’ experience in DevOps or Platform … Engineering Strong background in Azure and Kubernetes Hands-on experience with Terraform, CI/CD, and container orchestration Familiarity with observability tools (Prometheus, Grafana, Loki) Scripting or programming skills in Python or Go Interest in AI infrastructure, LLMOps, or large language model deployment More ❯
Hereford, Herefordshire, England, United Kingdom Hybrid/Remote Options
Hays Specialist Recruitment Limited
role focused on ensuring service availability, performance, and cost-efficiency across both cloud and on-prem infrastructure.You'll work closely with development and support teams to evolve infrastructure, enhance observability, and proactively mitigate reliability risks.Key Responsibilities:Collaborate with software engineers to improve reliability and performanceAutomate operational tasks and reduce alert fatigueEnhance monitoring and observability to pre-empt issuesSupport development environments … protocolsExperience with cloud platforms, ideally AWS (EC2, RDS, S3, Lambda)Desirable:Coding experience in Java, Go, Python or similarKnowledge of cross-domain technologiesExperience in service management environmentsPractical application of observability patternsExperience with AzureAdditional Information:Due to the nature of the work, successful candidates will be required to undergo security vetting.We welcome applications from all backgrounds and are committed to creating More ❯
across the organization. What you’ll be doing: Building and maintaining a Kubernetes-hosted AI platform (AKS) Deploying and managing LLMOps tools such as LiteLLM, Langflow, and Langfuse Implementing observability with Prometheus, Grafana, and Loki Managing infrastructure through Terraform, ArgoCD, and GitHub Actions Supporting internal AI applications including RAG, document processing, and internal AI assistants What you’ll need … years in Platform or DevOps Engineering (Azure preferred) Strong experience with Kubernetes, Docker, and Terraform Programming or scripting skills in Python or Go Familiarity with GitOps, Helm, and observability tools A learning mindset and interest in LLM operations More ❯
A fast-growing technology business is developing advanced software for accounting, payroll, tax, and practice management. With a strong engineering foundation and a clear commercial vision, the company is now expanding its focus on artificial intelligence to transform how professional More ❯
act? This is a chance to design and deliver agentic AI systems on Azure that automate real business workflows through tool use, retrieval, and reasoning, with the reliability and observability of true production engineering. In this position you’ll take ownership of designing and scaling end-to-end agentic solutions on Azure, combining LLMs, APIs, and orchestration frameworks to deliver … Productionise on Azure using AI Foundry/OpenAI, Azure ML, Functions, Event Grid/Service Bus, and Kubernetes. Build LLMOps pipelines for evaluation, monitoring, safety, and cost control. Define observability standards across prompts, tools, and data flows. Establish governance patterns, safety, privacy, and auditability. Stay hands-on with critical code paths while guiding architecture and best practice. 🧠Required Skills/ More ❯
patterns where appropriate Ensure APIs are well-documented using OpenAPI/Swagger standards Build and maintain a developer portal for internal and external API consumers Quality & Operations Implement comprehensive observability including logging, monitoring, and alerting Design for reliability, fault tolerance, and graceful degradation Optimize API performance, scalability, and cost efficiency Write clean, maintainable code with thorough testing and documentation Configure … and modern security patterns Testing mindset - you write unit tests and understand integration testing API documentation experience using OpenAPI/Swagger and maintaining developer portals Production systems mindset covering observability, reliability, and operational excellence Architectural thinking - ability to design systems for scale, security, and evolution Keywords RESTful APIs C# .Net Azure AI LLM ML Machine Learning SaaS Scale Up OAuth More ❯
Telford, Shropshire, West Midlands, United Kingdom
Sanderson Government and Defence
insight, and proactive incident management. Key Responsibilities Translate high-level monitoring non-functional requirements (NFRs) into actionable configurations across tools such as Splunk, Dynatrace, and AppDynamics. Deliver full-stack observability solutions, including application-aware network performance monitoring (NPM), synthetics, log analytics, and infrastructure metrics. Provide live support for monitoring technologies and assist with live service support, including key business events … improvement initiatives and tooling exploitation to enhance operational efficiency efficiency within immature teams Required Skills and Experience Strong understanding and expereince in SRE principals and methodologies Strong understanding of Observability within a complex tech stack Hands-on experience with monitoring tools such as Splunk, Splunk ITSI, Dynatrace, AppDynamics, and synthetic monitoring platforms. Strong understanding and experience with implementing and using More ❯