1 to 25 of 28 Permanent Observability Jobs in Wales

Java Developer

Hiring Organisation: scrumconnect ltd
Location: Swansea, West Glamorgan, United Kingdom
Employment Type: Permanent
Salary: GBP 35,000 - 45,000 Annual

projects). Experience implementing cloud-native architectures and event-driven systems. Familiarity with Infrastructure as Code tools such as Terraform or CloudFormation. Experience with observability and monitoring tools such as ELK, Grafana, Prometheus, or Splunk. Relevant Java, AWS, Azure, GCP, Kubernetes, or architecture certifications. Experience with Domain-Driven Design ...

Remote Software Engineer - Full-Stack

Hiring Organisation: 17918
Location: Bangor, Caernarfonshire, United Kingdom

/MySQL (relational) MongoDB, Redis (NoSQL) Azure Fabric, Data Factory Azure Event Hubs, Kusto QL Tooling & Monitoring Datadog, CircleCI, Prometheus, Grafana Strong focus on observability and fault tolerance Why Join? Mission-driven: Make a meaningful impact on digital accessibility Equity potential if converted to permanent High agency and ownership ...

Documentum Developer

Hiring Organisation: Experis
Location: Newport, Isle of Wight, UK
Employment Type: Full-time

Strong Java development skills \n Experience working with REST APIs and system integrations \n Understanding of microservices architecture (preferred) \n\n DevOps, Databases & Observability \n \n Experience with Git and CI/CD tools \n Knowledge of relational databases (Oracle, PostgreSQL) \n Familiarity with container technologies such as Docker ...

Senior Next.js Developer

Hiring Organisation: scrumconnect ltd
Location: Swansea, West Glamorgan, United Kingdom
Employment Type: Permanent
Salary: GBP 55,000 - 65,000 Annual

Legacy systems (eg, older Java, AIX, Oracle, or Mainframe environments). Cloud Infrastructure: Familiarity with AWS serverless patterns, infrastructure-as-code principles, and proactive observability monitoring. Diversity & Inclusion At Scrumconnect Consulting, we believe that diversity drives innovation and better outcomes. We are committed to fostering an inclusive environment where every ...

Remote Senior Software Engineer - Grafana Cloud Observability Provider UK Remote

Hiring Organisation: 17918
Location: Newport, Monmouthshire, United Kingdom

Grafana Labs, the company behind the open observability cloud, is founded on the principles of open source, open standards, open ecosystems, and open culture. Grafana Cloud, our fully managed observability platform, is flexible and built for scale. With Grafana Cloud's actually useful AI, organizations can see, understand … remote opportunity and we are looking for candidates in the UK, Spain, Germany or Sweden. What is Grafana Cloud? Grafana Cloud is our composable observability platform that integrates metrics, logs, and traces with Grafana. It allows our customers to leverage the best open source observability software including Prometheus, Mimir, Loki ...

Remote Principal Software Engineer Backend Technologies, Platform (UK Remote)

Hiring Organisation: 17918
Location: Swansea, Glamorgan, United Kingdom

deadlines and business goals. Preferred Skills: Experience with frontend technologies such as React, Angular, or Web Components is a plus. Familiarity with monitoring and observability tools (e.g., CloudWatch, New Relic, Datadog). Knowledge of data modeling and working with both NoSQL databases. Understanding of agile methodologies, including Scrum ...

CIAM Software Engineer

Hiring Organisation: Sanderson Recruitment
Location: Cardiff, South Glamorgan, Wales, United Kingdom
Employment Type: Permanent, Work From Home
Salary: £60,000

automation, and deployment processes. Develop tooling and automation to improve delivery consistency and reduce operational overhead. Create monitoring and alerting solutions to improve platform observability and incident response. Collaborate with cross-functional teams to deliver secure and reliable customer-facing solutions. What We're Looking For Proven experience delivering solutions ...

Remote Senior Manager, Engineering, Docker Agents (London)

Hiring Organisation: Docker
Location: Newport, UK
Employment Type: Full-time

related to quality, velocity, and reliability. Remove blockers and improve developer productivity through better tooling and processes. Drive improvements in CI/CD, testing, observability, and system resilience. Culture & Values Model strong leadership behaviors rooted in empathy, clarity, and ownership. Promote inclusion, diversity, and fairness in hiring and team development. ...

Remote Senior Manager, Engineering, Docker Agents (London)

Hiring Organisation: 17918
Location: Bangor, Caernarfonshire, United Kingdom

Lead Engineer (Full Stack)

Hiring Organisation: scrumconnect ltd
Location: Swansea, West Glamorgan, United Kingdom
Employment Type: Permanent
Salary: GBP 70,000 - 75,000 Annual

with code: contributing to critical paths, conducting code reviews, and pairing with engineers Own the technical roadmap and non-functional requirements (performance, scalability, observability, security) for assigned services Mentor and coach senior and mid-level engineers, raising the engineering bar through example and feedback Lead CI/CD, infrastructure ...

Remote Ruby Engineer

Hiring Organisation: 17918
Location: Bangor, Caernarfonshire, United Kingdom

squad to deliver work in small, valuable increments using agile practices and modern tooling. Monitoring, debugging and improving the performance of our systems with observability tools. You ll love this role if You are a Ruby developer (2+ years of experience) with good knowledge of Ruby on Rails ...

AVP, Technology Platforms & Operations

Hiring Organisation: Jobleads-UK
Location: Parrog, Wales, United Kingdom

Drive consistent reporting on uptime, incidents, DR posture, and operational risk. DevOps, Infrastructure & Scale Lead DevOps, infrastructure, and platform engineering capabilities to improve automation, observability, and scalability. Partner with Product Engineering to ensure platforms enable-not constrain-delivery velocity. Optimize cost‐to‐run while maintaining reliability and growth readiness. Enterprise ...

Remote Senior Platform Engineer

Hiring Organisation: 17918
Location: Wrexham, Denbighshire, United Kingdom

evolve the fully automated CI and CD pipelines. This includes establishing best practices for fast, reliable, and secure build, test, and deployment processes. Observability: Implement and manage robust systems for monitoring (metrics), logging (centralised log aggregation), and distributed tracing to provide deep insights into application and infrastructure health. What … have: Proven experience developing robust, maintainable, and well-tested automation scripts, services and pipelines to manage infrastructure, deployments, and operational tasks. Operational Tooling and Observability Management: Must have: Experience owning, managing, and maintaining mission-critical operational tooling. Desirable: Proven background in implementing and managing centralised logging solutions or similar platforms ...

Lead Security Data Scientist

Hiring Organisation: DWP Digital
Location: Pontypridd, Mid Glamorgan, Wales, United Kingdom
Employment Type: Permanent, Work From Home
Salary: £75,000

strategy and drive delivery within a large, complex organisation Knowledge of modern AI approaches, including areas such as Retrieval Augmented Generation (RAG), prompt engineering, observability and evaluation Experience in cyber security, fraud analytics, behavioural analytics or threat detection would be particularly valuable. This role sits at the intersection of data ...

Remote DevOps Engineer

Hiring Organisation: Medica Group
Location: Swansea, UK
Employment Type: Full-time

ensure consistent, automated cloud environments. Working across Medica's cloud‐native platforms—from container orchestration and networking to identity and monitoring—you'll develop observability capabilities and support reliable deployment workflows. You'll troubleshoot infrastructure and pipeline issues, embed security and compliance controls, and contribute to incident response and continual … scripting languages (PowerShell, Bash, Python). Strong experience with containers and orchestration (Docker, Kubernetes/AKS, Serverless platforms, ACA) and image security practices. Observability tooling (e.g., Azure Monitor, Application Insights, Prometheus/Grafana), log aggregation, alert design. Proven track record implementing DevSecOps controls (SAST/DAST, dependency scanning, secrets management ...

Senior Site Reliability Engineer - Python

Hiring Organisation: Inspire People
Location: Cardiff, South Glamorgan, Wales, United Kingdom
Employment Type: Permanent, Part Time, Work From Home
Salary: £80,000

operating and improving cloud-based platforms and services used across DBT and wider government. Working across Python development, cloud infrastructure, CI/CD pipelines, observability and automation, you'll help improve reliability, developer experience and service performance while supporting critical business services used by thousands of users. As a Senior … code approaches. Support teams to adopt Site Reliability Engineering practices, including Service Level Indicators (SLIs), Service Level Objectives (SLOs) and error budgets. Contribute to observability across services, helping teams better understand performance, reliability and user impact. Develop and improve CI/CD pipelines to enable safe, frequent and low-risk ...

Senior Site Reliability Engineer - Python

Hiring Organisation: Inspire People
Location: Cardiff, South Glamorgan, Wales, United Kingdom
Employment Type: Full-Time
Salary: £63,824 - £80,158 per annum, Pro-rata, Inc benefits

Remote Principal Cloud Infrastructure Engineer

Hiring Organisation: 17918
Location: Cardiff, Glamorgan, United Kingdom

load balancing, and network security controls. Build, maintain , and document infrastructure templates and developer enablement tooling to allow teams to deploy independently. Implement observability and monitoring systems using Grafana, Prometheus, and Loki for infrastructure and application metrics. Establish and contribute to CI/CD best practices using GitHub Actions … Docker) and maintaining image registries and artifact repositories. Solid understanding of networking and security fundamentals (VPNs, firewall rules, IAM policies, encryption). Familiarity with observability and alerting stacks (Prometheus, Grafana, Loki). Excellent communicator, able to write and present technical designs and proposals clearly. Experience mentoring others and leading ...

Remote ML Infrastructure Lead

Hiring Organisation: Iproov
Location: Wrexham, Wales, UK
Employment Type: Full-time

versioning, reproducibility, experimentation, feature management and release management Own and improve the production environment for machine learning systems, ensuring strong standards for availability, performance, observability and resilience Define and implement monitoring across model and platform layers, including system health, data quality, drift, latency, throughput and cost efficiency Build or optimise … pipelines, infrastructure-as-code and workflow orchestration Experience with tools such as Airflow or similar platform and orchestration technologies Good understanding of model observability, data quality, feature pipelines, lineage and reproducibility Experience designing scalable infrastructure for ML workloads, including training, batch inference and real-time serving Strong appreciation of reliability ...

Remote Cloud Engineer (AWS) Full time - Remote EU

Hiring Organisation: Retinai Medical
Location: Wrexham, Wales, UK
Employment Type: Full-time

services. Manage and prioritize tasks in the cloud infrastructure backlog to address immediate needs and plan long-term improvements. Set up infrastructure monitoring and observability solutions, proactively addressing availability, performance or security issues. Learn the current infrastructure and take ownership of day to day cloud operational tasks and activities. Assess … with software version control and Git. Strong understanding of cloud networking concepts, including VPC, VPC Peering, Subnets, and Load Balancing. Familiarity with monitoring and observability tools for cloud environments, such as Grafana, Prometheus, OpenSearch, and the ELK stack. Strong analytical and problem-solving skills, with a proactive approach to challenges. ...

Lead Site Reliability Engineer (SRE Squad Lead)

Hiring Organisation: Inspire People
Location: Cardiff, UK
Employment Type: Full-time

supportive and diverse engineering community.Design, build and maintain reliable, secure and scalable cloud-based infrastructure using infrastructure-as-code approaches.Enable teams to develop effective observability practices, including monitoring, logging, metrics and alerting that support proactive service management.Work with teams to define and embed Service Level Indicators (SLIs), Service Level Objectives … DevOps professionals, helping shape platform strategy, improve service reliability and support the delivery of critical digital services across government.The team is actively investing in observability, service-level management, platform automation, developer experience and cloud engineering. You'll join a culture that values collaboration, continuous learning and the freedom to explore ...

Lead Site Reliability Engineer (SRE Squad Lead)

Hiring Organisation: Inspire People
Location: Cardiff, South Glamorgan, Wales, United Kingdom
Employment Type: Permanent, Part Time, Work From Home
Salary: £80,000

diverse engineering community. Design, build and maintain reliable, secure and scalable cloud-based infrastructure using infrastructure-as-code approaches. Enable teams to develop effective observability practices, including monitoring, logging, metrics and alerting that support proactive service management. Work with teams to define and embed Service Level Indicators (SLIs), Service Level … professionals, helping shape platform strategy, improve service reliability and support the delivery of critical digital services across government. The team is actively investing in observability, service-level management, platform automation, developer experience and cloud engineering. You'll join a culture that values collaboration, continuous learning and the freedom to explore ...

Remote Platform & Cloud Engineer

Hiring Organisation: Ims
Location: Newport, UK
Employment Type: Full-time

/CD pipelines The key mindset here is thinking about other engineers as your customers, and building tooling that reduces friction and cognitive load. Observability & Operational Maturity Comfortable with CloudWatch, and ideally a third-party observability tool. They should understand SLOs and error budgets beyond the theory — and know ...

Remote Machine Learning Engineer

Hiring Organisation: 17918
Location: Cardiff, Glamorgan, United Kingdom

Who we are: Sardine is the leading agentic risk platform for fighting financial crime. Our integrated solution unifies data across risk teams to help organizations stop fraud in real time, prevent AI-driven attacks, and ...

Remote Senior GenAI Platform Engineer

Hiring Organisation: 17918
Location: Newport, Monmouthshire, United Kingdom

engineering. You will help design, build, and operate the shared AI infrastructure used by product teams across Pleo, with a strong focus on reliability, observability, security, and developer experience. Who you ll be working with and reporting to You'll be reporting to the Engineering Manager for the GenAI Platform … GenAI platform components used by product teams at Pleo, including LLM routing gateway, vector search and RAG infrastructure, tool registry and MCP gateway, AI observability and evaluation tooling (tracing LLM calls, supporting human and automated evaluation, detecting drift, and tracking costs) and infrastructure for multi-step, long-running agentic workflows. ...