Out If You Have: Practical experience managing large-scale Kubernetes clusters; certifications in Kubernetes are a strong bonus Hands-on familiarity with the Grafana Observability Suite, including tools like Loki, Mimir, and Tempo Background in administering or developing with popular monitoring and automation tools such as Splunk, Datadog, PagerDuty, or Rundeck Experience using configuration management platforms like Ansible, Puppet More ❯
Southampton, Hampshire, South East, United Kingdom Hybrid / WFH Options
Spectrum It Recruitment Limited
Out If You Have: Practical experience managing large-scale Kubernetes clusters; certifications in Kubernetes are a strong bonus Hands-on familiarity with the Grafana Observability Suite, including tools like Loki, Mimir, and Tempo Background in administering or developing with popular monitoring and automation tools such as Splunk, Datadog, PagerDuty, or Rundeck Experience using configuration management platforms like Ansible, Puppet More ❯
Ilkley office for occasional office attendance. VARIED DAY TO DAY RESPONSIBILITIES Ensuring system reliability, performance, and scalability through monitoring and automation Building and maintaining observability solutions using Grafana, Prometheus, Loki, OpenTelemetry Proactively identifying and resolving performance bottlenecks and infrastructure issues Automating infrastructure provisioning, configuration management, and deployments Implementing effective logging, monitoring, and alerting strategies Managing incident response and post … error budgets, and service-level objectives (SLOs) Experience designing and implementing robust observability, monitoring and logging solutions Strong proficiency with observability and monitoring tools such as Grafana, Prometheus, and Loki Strong experience with distributed tracing and telemetry tools such as OpenTelemetry An understanding of cloud networking architecture and load balancing techniques Experience with container orchestration platforms like Kubernetes Proficiency More ❯
on set targets will be expected. VARIED DAY TO DAY RESPONSIBILITIES Ensuring system reliability, performance, and scalability through monitoring and automation Building and maintaining observability solutions using Grafana, Prometheus, Loki, OpenTelemetry Proactively identifying and resolving performance bottlenecks and infrastructure issues Automating infrastructure provisioning, configuration management, and deployments Implementing effective logging, monitoring, and alerting strategies Managing incident response and post … error budgets, and service-level objectives (SLOs) Experience designing and implementing robust observability, monitoring and logging solutions Strong proficiency with observability and monitoring tools such as Grafana, Prometheus, and Loki Strong experience with distributed tracing and telemetry tools such as OpenTelemetry An understanding of cloud networking architecture and load balancing techniques Experience with container orchestration platforms like Kubernetes Proficiency More ❯
Operators). Cloud platforms (Azure, AWS). CI/CD pipelines (Azure DevOps, Bitbucket Pipelines, GitHub Actions). GitOps (e.g., ArgoCD, FluxCD). Monitoring, logging, and alerting (Prometheus, Grafana, Loki). Scripting and automation (Python, Bash, etc.). Exposure to the following would be beneficial: Multi-cloud or hybrid-cloud architectures. Security practices for cloud-native platforms. Cost optimisation More ❯
code tools (e.g., Terraform, Helm, Bash, Python). Solid understanding of microservices, zero-trust security, mTLS, RBAC, and network policies. Experience with CI/CD tools, logging (e.g., Fluentd, Loki), and monitoring (e.g., Prometheus, Grafana). About us Ascendion is a global, leading provider of AI-first software engineering services, delivering transformative solutions across North America, APAC, and Europe. More ❯
Social network you want to login/join with: We believe that we are better together, and at Tripadvisor we welcome you for who you are. Our workplace is for everyone, as is our people powered platform. At Tripadvisor, we More ❯
language, preferably python. Practitioner of unit testing, performance testing and BDD/acceptance testing. Understanding of OAuth 2.0 protocol for secure authorization. Proficiency with Open Telemetry tools including Grafana, Loki, Prometheus, and Cortex. Demonstrated experience in DevOps, understanding of CI/CD (Jenkins) and GitOps. Ability to articulate technical concepts effectively to diverse audiences. Strong desire and ability to More ❯
Join to apply for the Lead Software Engineer (PostgreSQL) role at Tripadvisor . 2 weeks ago Be among the first 25 applicants. About Tripadvisor At Tripadvisor, we believe that we are better together and welcome you for who you are. More ❯
Understanding of security protocols, authentication (OAuth, JWT), and data protection best practices Solid grasp of scalable architecture, SOLID principles, and clean coding standards Experience with observability tools (eg Grafana, Loki) and automated testing frameworks Comfortable working in agile, cooperative teams with product-first thinking Apply with your CV below More ❯
and Terraform Building and managing secure, automated CI/CD pipelines (GitHub Actions, ArgoCD) Automating provisioning and scaling for Redis, Kafka, and PostgreSQL Implementing observability and monitoring (Prometheus, Grafana, Loki, etc.) Managing identity and access control frameworks (Keycloak or similar) Championing infrastructure security best practices : RBAC, secrets management, hardening Collaborating with backend and AI engineers to ensure performance and More ❯
engage senior leadership and drive strategic outcomes. Strong architectural abilities towards building a holistic developer experience. Experience with Kubernetes, Istio, and Envoy. Experience with observability tools like Prometheus, Grafana, Loki, Sumo Logic, XSIAM, etc. Experience with AI in automating security processes. Bachelor’s in Computer Science, or equivalent work experience. Benefits Roku is committed to offering a diverse range More ❯
Social network you want to login/join with: Are you passionate about creating well-designed solutions in a collaborative and nurturing environment? Do you thrive when your ideas are valued and you can contribute to a team's success More ❯
minimising resolution times and turnaround of code-fixes. Job Duties • Prioritise and provide advanced troubleshooting of incidents escalated via ServiceDesk across a range of technologies: Internal software, MySQL, Instana, Loki, RabbitMQ, Linux & Windows OS, Splunk, Prometheus, Grafana. • Develop clear and concise internal troubleshooting documentation to streamline incident resolution, ensuring each guide includes step-by-step instructions, common error scenarios …/Service or recent relevant qualification. • Previous experience and/or understanding of Windows & Linux OS. • Experience with one or a number of the following monitoring tools: Instana, Splunk, Loki, Prometheus, Grafana. • Experience with Database technologies such as Mysql, MongoDb or Redis and the relevant query language. • Previous experience and/or understanding of cloud-based infrastructure (ideally AWS More ❯
minimising resolution times and turnaround of code-fixes. Job Duties Prioritise and provide advanced troubleshooting of incidents escalated via ServiceDesk across a range of technologies: Internal software, MySQL, Instana, Loki, RabbitMQ, Linux & Windows OS, Splunk, Prometheus, Grafana. Develop clear and concise internal troubleshooting documentation to streamline incident resolution, ensuring each guide includes step-by-step instructions, common error scenarios …/Service or recent relevant qualification. Previous experience and/or understanding of Windows & Linux OS. Experience with one or a number of the following monitoring tools: Instana, Splunk, Loki, Prometheus, Grafana. Experience with Database technologies such as Mysql, MongoDb or Redis and the relevant query language. Previous experience and/or understanding of cloud-based infrastructure (ideally AWS More ❯