101 to 125 of 157 Remote Grafana Jobs

Software Developer

Hiring Organisation: Mustard Systems Ltd
Location: Slough, Berkshire, UK
Employment Type: Full-time

work, and Go for select infrastructure Tools: RabbitMQ and Kafka for messaging, PostgreSQL and Redis for data storage Environment: Linux servers Observability: OpenTelemetry, Prometheus, Grafana and Zabbix Requirements Must-Haves: Strong background in software development, with strong experience with Python A degree in Computer Science or a numerical subject from ...

Site Reliability Engineer

Hiring Organisation: Searchability
Location: Wigan, Lancashire, England, United Kingdom
Employment Type: Full-Time
Salary: £65,000 - £70,000 per annum

preferred) * Cloud experience, ideally AWS, and knowledge of container orchestration (Kubernetes) and Infrastructure as Code (Terraform) * Experience with monitoring and observability tools such as Grafana, Prometheus or OpenTelemetry * Strong understanding of networking fundamentals and distributed systems* Ability to collaborate effectively with engineering, operations and product teams TO BE CONSIDERED: Please ...

Site Reliability Engineer

Hiring Organisation: Searchability (UK) Ltd
Location: Wigan, Greater Manchester, North West, United Kingdom
Employment Type: Permanent, Work From Home
Salary: £70,000

preferred) * Cloud experience, ideally AWS, and knowledge of container orchestration (Kubernetes) and Infrastructure as Code (Terraform) * Experience with monitoring and observability tools such as Grafana, Prometheus or OpenTelemetry * Strong understanding of networking fundamentals and distributed systems * Ability to collaborate effectively with engineering, operations and product teams TO BE CONSIDERED: Please ...

Tech Lead

Hiring Organisation: Acorn Insurance
Location: Liverpool, Merseyside, North West, United Kingdom
Employment Type: Permanent, Work From Home
Salary: £85,000

Framework, MassTransit, Mediator Frontend:React, Next.js, TypeScript Infrastructure: Azure, Docker, Kubernetes (AKS), Nginx, RabbitMQ Architecture: Microservices, Event-driven patterns, Clean Architecture Observability and Monitoring: Grafana, Loki, Sentry, PostHog Tooling and Practices: Git, CI/CD pipelines, Agile methodologies What We're Looking For Proven experience leading software delivery within ...

Senior Software Engineer, Platform Observability Remote - Ireland

Hiring Organisation: Twilio
Location: Dublin, Ireland
Employment Type: Permanent
Salary: EUR 60,000 - 90,000 Annual

telemetry standards, efficient usage patterns, and scalable platform abstractions. Ability to make forward-looking technical decisions and lead others through ambiguity. Familiarity with ClickHouse, Grafana Loki, Athena, or equivalent systems for log and metrics querying. Contributions to open-source observability tools or communities. Experience building cost visibility or FinOps tooling ...

Full Stack Engineer

Hiring Organisation: Global Fintech Talent
Location: Zuid-Holland, Netherlands
Employment Type: Permanent
Salary: EUR Annual

GitHub Actions, GitLab CI, etc.). Managing cloud infrastructure (GCP/AWS) using Terraform. Working with Docker & Kubernetes (GKE), plus monitoring stacks like Datadog, Grafana, Prometheus. Implementing DevSecOps practices: IAM, secrets management, vulnerability scanning. Building and improving infrastructure in a setting where not every process exists yet - and where your … TypeScript, Python, or Bash. Kubernetes certifications (CKA/CKAD), Terraform Associate. Experience in fintech or other regulated environments. Knowledge of observability tooling (OpenTelemetry, Grafana, Prometheus). What's in it for you: Competitive salary based on experience (€50-80K). Participation in an equity/share certificate program. ...

Site Reliability Engineer

Hiring Organisation: SS&C Technologies
Location: London, UK
Employment Type: Full-time

resolve incidents across services and infrastructure; reduce MTTR and prevent recurrences through high-quality post-incident actions. Observability as a first‐class practice: Use Grafana, Datadog, and Splunk (and related tools like Prometheus/OpenTelemetry) to detect anomalies, root cause issues, and create actionable alerts and dashboards. Run Kubernetes … chaos engineering). What you will bring 5+ years operating production systems as an SRE, DevOps engineer, or software engineer. Observability: Hands‐on with Grafana, Datadog, and Splunk for incident investigation, dashboarding, alerting, tracing/logs/metrics correlation, and performance analysis. Kubernetes: Strong experience running and troubleshooting workloads (controllers ...

Site Reliability Engineer

Hiring Organisation: SS&C Technologies
Location: Nottingham, UK
Employment Type: Full-time

resolve incidents across services and infrastructure; reduce MTTR and prevent recurrences through high-quality post-incident actions. Observability as a first‐class practice: Use Grafana, Datadog, and Splunk (and related tools like Prometheus/OpenTelemetry) to detect anomalies, root cause issues, and create actionable alerts and dashboards. Run Kubernetes … chaos engineering). What you will bring 5+ years operating production systems as an SRE, DevOps engineer, or software engineer. Observability: Hands‐on with Grafana, Datadog, and Splunk for incident investigation, dashboarding, alerting, tracing/logs/metrics correlation, and performance analysis. Kubernetes: Strong experience running and troubleshooting workloads (controllers ...

Site Reliability Engineer

Hiring Organisation: SS&C Technologies
Location: Liverpool, UK
Employment Type: Full-time

resolve incidents across services and infrastructure; reduce MTTR and prevent recurrences through high-quality post-incident actions. Observability as a first‐class practice: Use Grafana, Datadog, and Splunk (and related tools like Prometheus/OpenTelemetry) to detect anomalies, root cause issues, and create actionable alerts and dashboards. Run Kubernetes … chaos engineering). What you will bring 5+ years operating production systems as an SRE, DevOps engineer, or software engineer. Observability: Hands‐on with Grafana, Datadog, and Splunk for incident investigation, dashboarding, alerting, tracing/logs/metrics correlation, and performance analysis. Kubernetes: Strong experience running and troubleshooting workloads (controllers ...

Site Reliability Engineer

Hiring Organisation: SS&C Technologies
Location: Southampton, UK
Employment Type: Full-time

resolve incidents across services and infrastructure; reduce MTTR and prevent recurrences through high-quality post-incident actions. Observability as a first‐class practice: Use Grafana, Datadog, and Splunk (and related tools like Prometheus/OpenTelemetry) to detect anomalies, root cause issues, and create actionable alerts and dashboards. Run Kubernetes … chaos engineering). What you will bring 5+ years operating production systems as an SRE, DevOps engineer, or software engineer. Observability: Hands‐on with Grafana, Datadog, and Splunk for incident investigation, dashboarding, alerting, tracing/logs/metrics correlation, and performance analysis. Kubernetes: Strong experience running and troubleshooting workloads (controllers ...

Site Reliability Engineer

Hiring Organisation: SS&C Technologies
Location: Glasgow, UK
Employment Type: Full-time

resolve incidents across services and infrastructure; reduce MTTR and prevent recurrences through high-quality post-incident actions. Observability as a first‐class practice: Use Grafana, Datadog, and Splunk (and related tools like Prometheus/OpenTelemetry) to detect anomalies, root cause issues, and create actionable alerts and dashboards. Run Kubernetes … chaos engineering). What you will bring 5+ years operating production systems as an SRE, DevOps engineer, or software engineer. Observability: Hands‐on with Grafana, Datadog, and Splunk for incident investigation, dashboarding, alerting, tracing/logs/metrics correlation, and performance analysis. Kubernetes: Strong experience running and troubleshooting workloads (controllers ...

Site Reliability Engineer

Hiring Organisation: SS&C Technologies
Location: Leicester, UK
Employment Type: Full-time

resolve incidents across services and infrastructure; reduce MTTR and prevent recurrences through high-quality post-incident actions. Observability as a first‐class practice: Use Grafana, Datadog, and Splunk (and related tools like Prometheus/OpenTelemetry) to detect anomalies, root cause issues, and create actionable alerts and dashboards. Run Kubernetes … chaos engineering). What you will bring 5+ years operating production systems as an SRE, DevOps engineer, or software engineer. Observability: Hands‐on with Grafana, Datadog, and Splunk for incident investigation, dashboarding, alerting, tracing/logs/metrics correlation, and performance analysis. Kubernetes: Strong experience running and troubleshooting workloads (controllers ...

Site Reliability Engineer

Hiring Organisation: SS&C Technologies
Location: Leeds, UK
Employment Type: Full-time

resolve incidents across services and infrastructure; reduce MTTR and prevent recurrences through high-quality post-incident actions. Observability as a first‐class practice: Use Grafana, Datadog, and Splunk (and related tools like Prometheus/OpenTelemetry) to detect anomalies, root cause issues, and create actionable alerts and dashboards. Run Kubernetes … chaos engineering). What you will bring 5+ years operating production systems as an SRE, DevOps engineer, or software engineer. Observability: Hands‐on with Grafana, Datadog, and Splunk for incident investigation, dashboarding, alerting, tracing/logs/metrics correlation, and performance analysis. Kubernetes: Strong experience running and troubleshooting workloads (controllers ...

Site Reliability Engineer

Hiring Organisation: SS&C Technologies
Location: Birmingham, UK
Employment Type: Full-time

resolve incidents across services and infrastructure; reduce MTTR and prevent recurrences through high-quality post-incident actions. Observability as a first‐class practice: Use Grafana, Datadog, and Splunk (and related tools like Prometheus/OpenTelemetry) to detect anomalies, root cause issues, and create actionable alerts and dashboards. Run Kubernetes … chaos engineering). What you will bring 5+ years operating production systems as an SRE, DevOps engineer, or software engineer. Observability: Hands‐on with Grafana, Datadog, and Splunk for incident investigation, dashboarding, alerting, tracing/logs/metrics correlation, and performance analysis. Kubernetes: Strong experience running and troubleshooting workloads (controllers ...

Site Reliability Engineer

Hiring Organisation: SS&C Technologies
Location: Bristol, UK
Employment Type: Full-time

resolve incidents across services and infrastructure; reduce MTTR and prevent recurrences through high-quality post-incident actions. Observability as a first‐class practice: Use Grafana, Datadog, and Splunk (and related tools like Prometheus/OpenTelemetry) to detect anomalies, root cause issues, and create actionable alerts and dashboards. Run Kubernetes … chaos engineering). What you will bring 5+ years operating production systems as an SRE, DevOps engineer, or software engineer. Observability: Hands‐on with Grafana, Datadog, and Splunk for incident investigation, dashboarding, alerting, tracing/logs/metrics correlation, and performance analysis. Kubernetes: Strong experience running and troubleshooting workloads (controllers ...

Site Reliability Engineer

Hiring Organisation: SS&C Technologies
Location: Woking, Surrey, UK
Employment Type: Full-time

resolve incidents across services and infrastructure; reduce MTTR and prevent recurrences through high-quality post-incident actions. Observability as a first‐class practice: Use Grafana, Datadog, and Splunk (and related tools like Prometheus/OpenTelemetry) to detect anomalies, root cause issues, and create actionable alerts and dashboards. Run Kubernetes … chaos engineering). What you will bring 5+ years operating production systems as an SRE, DevOps engineer, or software engineer. Observability: Hands‐on with Grafana, Datadog, and Splunk for incident investigation, dashboarding, alerting, tracing/logs/metrics correlation, and performance analysis. Kubernetes: Strong experience running and troubleshooting workloads (controllers ...

Site Reliability Engineer

Hiring Organisation: SS&C Technologies
Location: Shrewsbury, Shropshire, UK
Employment Type: Full-time

resolve incidents across services and infrastructure; reduce MTTR and prevent recurrences through high-quality post-incident actions. Observability as a first‐class practice: Use Grafana, Datadog, and Splunk (and related tools like Prometheus/OpenTelemetry) to detect anomalies, root cause issues, and create actionable alerts and dashboards. Run Kubernetes … chaos engineering). What you will bring 5+ years operating production systems as an SRE, DevOps engineer, or software engineer. Observability: Hands‐on with Grafana, Datadog, and Splunk for incident investigation, dashboarding, alerting, tracing/logs/metrics correlation, and performance analysis. Kubernetes: Strong experience running and troubleshooting workloads (controllers ...

Site Reliability Engineer

Hiring Organisation: SS&C Technologies
Location: Bedford, Bedfordshire, UK
Employment Type: Full-time

resolve incidents across services and infrastructure; reduce MTTR and prevent recurrences through high-quality post-incident actions. Observability as a first‐class practice: Use Grafana, Datadog, and Splunk (and related tools like Prometheus/OpenTelemetry) to detect anomalies, root cause issues, and create actionable alerts and dashboards. Run Kubernetes … chaos engineering). What you will bring 5+ years operating production systems as an SRE, DevOps engineer, or software engineer. Observability: Hands‐on with Grafana, Datadog, and Splunk for incident investigation, dashboarding, alerting, tracing/logs/metrics correlation, and performance analysis. Kubernetes: Strong experience running and troubleshooting workloads (controllers ...

Site Reliability Engineer

Hiring Organisation: SS&C Technologies
Location: Stevenage, Hertfordshire, UK
Employment Type: Full-time

resolve incidents across services and infrastructure; reduce MTTR and prevent recurrences through high-quality post-incident actions. Observability as a first‐class practice: Use Grafana, Datadog, and Splunk (and related tools like Prometheus/OpenTelemetry) to detect anomalies, root cause issues, and create actionable alerts and dashboards. Run Kubernetes … chaos engineering). What you will bring 5+ years operating production systems as an SRE, DevOps engineer, or software engineer. Observability: Hands‐on with Grafana, Datadog, and Splunk for incident investigation, dashboarding, alerting, tracing/logs/metrics correlation, and performance analysis. Kubernetes: Strong experience running and troubleshooting workloads (controllers ...

Site Reliability Engineer

Hiring Organisation: SS&C Technologies
Location: Plymouth, Devon, UK
Employment Type: Full-time

resolve incidents across services and infrastructure; reduce MTTR and prevent recurrences through high-quality post-incident actions. Observability as a first‐class practice: Use Grafana, Datadog, and Splunk (and related tools like Prometheus/OpenTelemetry) to detect anomalies, root cause issues, and create actionable alerts and dashboards. Run Kubernetes … chaos engineering). What you will bring 5+ years operating production systems as an SRE, DevOps engineer, or software engineer. Observability: Hands‐on with Grafana, Datadog, and Splunk for incident investigation, dashboarding, alerting, tracing/logs/metrics correlation, and performance analysis. Kubernetes: Strong experience running and troubleshooting workloads (controllers ...

Site Reliability Engineer

Hiring Organisation: SS&C Technologies
Location: Norwich, Norfolk, UK
Employment Type: Full-time

resolve incidents across services and infrastructure; reduce MTTR and prevent recurrences through high-quality post-incident actions. Observability as a first‐class practice: Use Grafana, Datadog, and Splunk (and related tools like Prometheus/OpenTelemetry) to detect anomalies, root cause issues, and create actionable alerts and dashboards. Run Kubernetes … chaos engineering). What you will bring 5+ years operating production systems as an SRE, DevOps engineer, or software engineer. Observability: Hands‐on with Grafana, Datadog, and Splunk for incident investigation, dashboarding, alerting, tracing/logs/metrics correlation, and performance analysis. Kubernetes: Strong experience running and troubleshooting workloads (controllers ...

Site Reliability Engineer

Hiring Organisation: SS&C Technologies
Location: Gloucester, Gloucestershire, UK
Employment Type: Full-time

resolve incidents across services and infrastructure; reduce MTTR and prevent recurrences through high-quality post-incident actions. Observability as a first‐class practice: Use Grafana, Datadog, and Splunk (and related tools like Prometheus/OpenTelemetry) to detect anomalies, root cause issues, and create actionable alerts and dashboards. Run Kubernetes … chaos engineering). What you will bring 5+ years operating production systems as an SRE, DevOps engineer, or software engineer. Observability: Hands‐on with Grafana, Datadog, and Splunk for incident investigation, dashboarding, alerting, tracing/logs/metrics correlation, and performance analysis. Kubernetes: Strong experience running and troubleshooting workloads (controllers ...

Site Reliability Engineer

Hiring Organisation: SS&C Technologies
Location: Wakefield, West Yorkshire, UK
Employment Type: Full-time

resolve incidents across services and infrastructure; reduce MTTR and prevent recurrences through high-quality post-incident actions. Observability as a first‐class practice: Use Grafana, Datadog, and Splunk (and related tools like Prometheus/OpenTelemetry) to detect anomalies, root cause issues, and create actionable alerts and dashboards. Run Kubernetes … chaos engineering). What you will bring 5+ years operating production systems as an SRE, DevOps engineer, or software engineer. Observability: Hands‐on with Grafana, Datadog, and Splunk for incident investigation, dashboarding, alerting, tracing/logs/metrics correlation, and performance analysis. Kubernetes: Strong experience running and troubleshooting workloads (controllers ...

Site Reliability Engineer

Hiring Organisation: SS&C Technologies
Location: Newport, Isle of Wight, UK
Employment Type: Full-time

resolve incidents across services and infrastructure; reduce MTTR and prevent recurrences through high-quality post-incident actions. Observability as a first‐class practice: Use Grafana, Datadog, and Splunk (and related tools like Prometheus/OpenTelemetry) to detect anomalies, root cause issues, and create actionable alerts and dashboards. Run Kubernetes … chaos engineering). What you will bring 5+ years operating production systems as an SRE, DevOps engineer, or software engineer. Observability: Hands‐on with Grafana, Datadog, and Splunk for incident investigation, dashboarding, alerting, tracing/logs/metrics correlation, and performance analysis. Kubernetes: Strong experience running and troubleshooting workloads (controllers ...

Site Reliability Engineer

Hiring Organisation: SS&C Technologies
Location: Wolverhampton, West Midlands, UK
Employment Type: Full-time

resolve incidents across services and infrastructure; reduce MTTR and prevent recurrences through high-quality post-incident actions. Observability as a first‐class practice: Use Grafana, Datadog, and Splunk (and related tools like Prometheus/OpenTelemetry) to detect anomalies, root cause issues, and create actionable alerts and dashboards. Run Kubernetes … chaos engineering). What you will bring 5+ years operating production systems as an SRE, DevOps engineer, or software engineer. Observability: Hands‐on with Grafana, Datadog, and Splunk for incident investigation, dashboarding, alerting, tracing/logs/metrics correlation, and performance analysis. Kubernetes: Strong experience running and troubleshooting workloads (controllers ...