4 of 4 Permanent Incident Response Jobs in Cardiff

Senior Platform Engineer

Hiring Organisation
Inspire People
Location
Cardiff, South Glamorgan, Wales, United Kingdom
Employment Type
Permanent, Work From Home
Salary
£80,000
will receive an additional allowance. Specific projects the team are working on include rolling out an observability tool to enhance system monitoring and incident response, streamlining deployment processes to reduce downtime and speed up feature delivery, and developing a CLI tool to automate tasks and boost developer productivity. ...

3rd Line Service Desk Engineer

Hiring Organisation
Focus Resourcing Group
Location
Cardiff, South Glamorgan, Wales, United Kingdom
Employment Type
Permanent, Work From Home
Salary
£35,000
baselines. Automation: Develop PowerShell scripts to streamline operations and boost efficiency. Backup & Disaster Recovery: Ensure smooth backup operations, conduct regular recovery tests, and manage incident response. Upgrades & Improvements: Proactively monitor and enhance infrastructure, participate in technical projects, and engage in monthly client meetings. Service Desk: Handle tickets, diagnose issues ...

Cyber Security Resilience Manager

Hiring Organisation
Yolk Recruitment Limited
Location
Cardiff, South Glamorgan, Wales, United Kingdom
Employment Type
Permanent, Work From Home
NIST or ISO27001 Working with senior stakeholders to manage cyber risk and resilience planning Overseeing security architecture and identity strategy across enterprise environments Supporting incident response, risk management and regulatory engagement Helping embed a strong security culture across the organisation What we're looking for We're keen ...

System Monitoring & Observability Engineer (Prometheus / Grafana)

Hiring Organisation
SRT Marine Systems PLC
Location
Cardiff, South Glamorgan, United Kingdom
Employment Type
Permanent
Salary
£40000 - £65000/annum
maintain Prometheus-based monitoring solutions Develop and manage metric exporters for application and system-level data Optimise Prometheus scraping configurations and retention policies Alerting & Incident Response Define and maintain alert rules based on SLIs/SLOs and performance baselines Ensure alerts are actionable, with minimal false positives Participate … necessarily lead) in on-call rotations and incident postmortems Observability Dashboards Design and maintain Grafana dashboards for real-time operational insights Collaborate with engineering and product teams to create tailored visualisations Provide self-service dashboard capabilities for end users System Performance & Reliability Monitor infrastructure (servers, containers, databases, services ...