Hereford, Herefordshire, West Midlands, United Kingdom Hybrid / WFH Options
Hays
focused on ensuring service availability, performance, and cost-efficiency across both cloud and on-prem infrastructure. You'll work closely with development and support teams to evolve infrastructure, enhance observability, and proactively mitigate reliability risks. Key Responsibilities: Collaborate with software engineers to improve reliability and performance Automate operational tasks and reduce alert fatigue Enhance monitoring and observability to pre-empt … platforms, ideally AWS (EC2, RDS, S3, Lambda) Desirable: Coding experience in Java, Go, Python or similar Knowledge of cross-domain technologies Experience in service management environments Practical application of observability patterns Experience with Azure Additional Information: Due to the nature of the work, successful candidates will be required to undergo security vetting. We welcome applications from all backgrounds and are More ❯
and non-technical partners to deliver resilient infrastructure, champion data governance, and mentor others in engineering excellence. In this role, you will: Shape the data platform roadmap: Introduce modern observability, quality, and governance frameworks that elevate how teams access and trust data. Build and scale infrastructure: Develop services, APIs, and data pipelines using modern cloud tooling and automation-first principles. … scalable data platforms in production, enabling advanced users such as ML and analytics engineers. Hands-on experience with modern data stack tools - Airflow, DBT, Databricks, and data catalogue/observability solutions like Monte Carlo, Atlan, or DataHub. Solid understanding of cloud environments (AWS or GCP), including IAM, S3, ECS, RDS, or equivalent services. Experience implementing Infrastructure as Code (Terraform) and … e.g., Jenkins, GitHub Actions). A mindset focused on continuous improvement, learning, and staying at the forefront of emerging technologies. Nice to Have Experience rolling out data governance and observability frameworks, including lineage tracking, SLAs, and data quality monitoring. Familiarity with modern data lake table formats such as Delta Lake, Iceberg, or Hudi. Background in stream processing (Kafka, Flink, or More ❯
specialism in vulnerability management Self-starter, able to work in technical detail and motivate a diverse group of stakeholders to build sponsorship for significant and impactful change Desired: Establishing observability platforms Capabilities adjacent to exposure/vulnerability management capabilities (ie cyber security asset management, attack surface management, etc) Pragmatic application of zero-trust philosophies Cloud based security (GCP, AWS and More ❯
looking for an experienced Data Engineer to support on an initial 6 Month Contract engagement. You will own their data platform end to end, from ingestion & modelling to orchestration, observability & governance. You'll be responsible for designing & building robust, reliable pipelines, evolving their lakehouse/warehouse layers & enable fast, trustworthy analytics for multiple teams. Tech you'll be working with More ❯
AI applications e.g. MCP protocol. Training traditional ML and DL models using tools like Axolotl, LoRA, or QLoRA. Experience with multi-agent orchestration frameworks (LangGraph, AutoGen, CrewAI) Experience with observability and evaluation tools for LLMs such as TruLens or Helicone. Experience with AI safety and reliability frameworks like Guardrails AI. More ❯
experiences. Proven experience as a Business Analyst in an Agile environment Strong knowledge of market data and market data supervision Financial Services experience is mandatory Strong understanding of monitoring, observability, and telemetry (metrics, logs, traces) Ability to translate technical concepts into actionable business requirements Hands-on experience with tools such as Datadog, BigPanda, Grafana would be desirable Excellent stakeholder management More ❯
AI applications e.g. MCP protocol. Training traditional ML and DL models using tools like Axolotl, LoRA, or QLoRA. Experience with multi-agent orchestration frameworks (LangGraph, AutoGen, CrewAI) Experience with observability and evaluation tools for LLMs such as TruLens or Helicone. Experience with AI safety and reliability frameworks like Guardrails AI. More ❯
You will be responsible for developing a high-level blueprint model that aligns with the clients objectives. Additionally, as a subject matter expert, you will engage in a Reliability & Observability Maturity Assessment across selected client products or services, delivering insights into best practises, industry standards, observations, recommendations, and the associated benefits. This role requires facilitation of client-facing workshops and More ❯
Knutsford, Cheshire, England, United Kingdom Hybrid / WFH Options
Tenth Revolution Group
and driven Security Engineer to join our small, focused team building a telemetry pipeline MVP. You'll play a key role in designing and securing our containerized environments, ensuring observability tools and infrastructure are built with security at their core. This role blends deep technical expertise with a hands-on, collaborative approach ideal for someone who thrives in fast-moving … documentation and response playbooks What You Bring Hands-on experience with Kubernetes, OpenShift, and secure production systems Strong GitLab and CI/CD security expertise Familiarity with telemetry and observability stacks Solid grasp of networking, firewalls, and core security principles Knowledge of container security tools (Aqua, Twistlock, Trivy) Understanding of frameworks like NIST or ISO 27001 Excellent analytical and communication More ❯
best practice * Own, scope and deliver clearly defined work items, reporting progress in agile ceremonies * Develop, support and maintain production-grade cloud services, ensuring performance, security and stability * Utilise observability tooling to monitor live environments and remediate issues * Implement automated testing and contribute to CI/CD pipeline excellence * Participate actively in code reviews and contribute to coding standards and … technical stakeholders Desirable Skills * Integration design and delivery experience * Experience with Infrastructure as Code - AWS CDK preferred (Terraform beneficial) * Experience supporting and monitoring production systems (e.g. Splunk, Datadog, AWS observability tooling More ❯
Observability and Telemetry Specialist Whitehall Resources require an Observability and Telemetry Specialist to work with a key client on a 6 month initial contract. *This role will involve on site work in Manchester 3 days per week. *Inside IR35. Observability and Telemetry Specialist We are seeking a skilled Observability and Telemetry Specialist to enhance visibility across our IT infrastructure and … applications. The ideal candidate will have a strong background in financial services and deep expertise in monitoring, diagnostics, and performance optimization. Key Responsibilities: . Design and implement observability solutions across web applications, Servers, and network infrastructure. . Monitor and support Apache HTTP Server, Linux/UNIX systems, and web Servers. . Collaborate with IT operations, support, and security teams to … Server Administration . IT Operations, Support & Security . Network Access Control & Security . System Administration & Software Development . Experience in Financial Services environments Preferred Qualifications: . Proven experience in observability platforms and telemetry tools . Strong understanding of compliance and regulatory requirements in finance . Excellent problem-solving and communication skills All of our opportunities require that applicants are eligible More ❯
Service Reliability Engineer/to join our team. This role is pivotal in helping our clients visualise and implement improvements to their IT landscapes, focusing on system reliability and observability click apply for full job details More ❯
modern warehouse and experience with dbt. Building audiences at scale, including identity resolution and edge-case handling. Rigorous testing/QA mindset and production hygiene (version control, code review, observability). Understanding of privacy & consent in the UK/EU (GDPR, PECR) and operationalising suppression rules. Disclaimer: This vacancy is being advertised by either Advanced Resource Managers Limited, Advanced Resource More ❯
interfaces and deliver product features. Working with DevOps/Platform teams on CI/CD, containerisation and deployment (Docker, Kubernetes or managed alternatives). Troubleshooting production issues and improving observability (logging, metrics, tracing). Contributing to technical design discussions and driving improvements to reliability and performance. Tech Stack & Skills Core skills: Strong Python development experience (5+ years preferred) with production … Nice to have: Experience with async frameworks (FastAPI, Celery, or asyncio-based work). Exposure to event-driven architectures, message queues (Kafka, RabbitMQ) or pub/sub. Knowledge of observability tooling (Prometheus, Grafana, Sentry, ELK). Understanding of security best practices for web services (OWASP, authentication/authorization patterns). Experience working in product-led teams and mentoring junior engineers. More ❯
manage and support a customer's AWS and Data platform To be technical hands on Provide Incident and problem management on the AWS IaaS and PaaS Platform Monitoring and observability of system and platform performance Collaboration with development and build teams on application and platform deployments and changes Involvement in the resolution of Incidents and problems in an efficient and … timely manner Actively monitor an AWS platform and components for technical issues Implement and improve on existing monitoring and observability solution To be involved in the resolution of technical incidents tickets Assist in the root cause analysis of incidents Assist with improving efficiency and processes within the team Examining traces and logs Escalate incidents and problems to the appropriate teams More ❯
of data science concepts, AI/ML models, automation workflows, and agentic orchestration to enhance business processes. (LEAD) Experience designing and deploying Agentic AI solutions leveraging orchestration, pipelines, and observability on Microsoft Azure AI Foundry. (LEAD) Experience applying Generative AI, NLP, and prompting techniques. Strong understanding of AI governance, observability, and compliance frameworks. Excellent communication and presentation skills. More ❯
Telford, Shropshire, West Midlands, United Kingdom
Sanderson Government and Defence
insight, and proactive incident management. Key Responsibilities Translate high-level monitoring non-functional requirements (NFRs) into actionable configurations across tools such as Splunk, Dynatrace, and AppDynamics. Deliver full-stack observability solutions, including application-aware network performance monitoring (NPM), synthetics, log analytics, and infrastructure metrics. Provide live support for monitoring technologies and assist with live service support, including key business events … improvement initiatives and tooling exploitation to enhance operational efficiency efficiency within immature teams Required Skills and Experience Strong understanding and expereince in SRE principals and methodologies Strong understanding of Observability within a complex tech stack Hands-on experience with monitoring tools such as Splunk, Splunk ITSI, Dynatrace, AppDynamics, and synthetic monitoring platforms. Strong understanding and experience with implementing and using More ❯
proactive incident management. Key Responsibilities: . Translate high-level monitoring non-functional requirements (NFRs) into actionable configurations across tools such as Splunk, Dynatrace, and AppDynamics. . Deliver full-stack observability solutions, including application-aware network performance monitoring (NPM), synthetics, log analytics, and infrastructure metrics. . Provide live support for monitoring technologies and assist with live service support, including key business … and tooling exploitation to enhance operational efficiency efficiency within immature teams Required Skills and Experience: . Strong understanding and experience in SRE principals and methodologies . Strong understanding of Observability within a complex tech stack . Hands-on experience with monitoring tools such as Splunk, Splunk ITSI, Dynatrace, AppDynamics, and synthetic monitoring platforms. . Strong understanding and experience with implementing More ❯
Per Day Inside IR35 To apply, email: THE OPPORTUNITY We are looking for a Dynatrace Subject Matter Expert (Data Resilience). You'll play a critical role in improving observability, resilience, and performance monitoring across hybrid cloud environments using the Dynatrace platform. THE ROLE Collaborate with Application Stewards & Site Reliability Engineers (SREs) to confirm monitoring requirements for critical assets. Analyse … optimise, and automate monitoring within the Dynatrace toolset. Provide AI-driven insights (via Davis AI) for anomaly detection, root cause analysis, and proactive recommendations. Support operational resilience by embedding observability standards and best practices. Engage in workshops with third-party suppliers to review and improve observability standards. TECH STACK/REQUIREMENTS Strong expertise in Dynatrace (SaaS & On-Premises) including: Application … Anomaly Detection Profiles Alerting Rules & Profiles Synthetic & Log Monitoring Real User Monitoring (RUM) DQL & Grail for advanced data analytics API integrations with complex systems Experience working in resilience/observability engineering. Strong communication skills and ability to work in high-pressure environments. Flexible, pragmatic, and delivery-focused with a can-do attitude. Hands on AI Experience TO BE CONSIDERED... Please More ❯
and quality monitoring, ensuring full compliance with data protection standards. Create resilient data workflows and automation within Airflow, Databricks, and other modern big data ecosystems. Implement and manage data observability and cataloguing tools (e.g., Monte Carlo, Atlan, DataHub) to enhance visibility and reliability. Partner with ML engineers, analysts, and analytics engineers to understand their data needs and enable advanced data … grade data platforms and backend systems. Familiarity with data governance frameworks, privacy compliance, and automated data quality checks. Hands-on experience with big data tools (Airflow, Databricks) and data observability platforms. Collaborative mindset and experience working with cross-functional teams including ML and analytics specialists. Curiosity and enthusiasm for continuous learning - you stay up to date with the latest tools More ❯