ELK stack to our new, more scalable platform that helps engineers easily track and resolve issues, driving faster recovery times and reducing friction. They include Prometheus, Mimir, Grafana , Honeycomb , Loki and Quickwit . Collaborate closely with engineering teams to level up their observability, and make the most of the platform we offer with a long-term, sustainable focus. Help … to share knowledge and ensure the team is aligned. Experience with Observability Tools : Hands-on experience with monitoring, logging, and alerting systems such as Prometheus , Mimir , Grafana , ELK , Honeycomb , Loki , Quickwit or similar. Cloud Infrastructure & Orchestration : Extensive experience working with AWS , Terraform , Kubernetes , or other cloud platforms and container orchestration tools. It would be cool if you have: Experience More ❯
engage senior leadership and drive strategic outcomes. Strong architectural abilities towards building a holistic developer experience. Experience with Kubernetes, Istio, and Envoy. Experience with observability tools like Prometheus, Grafana, Loki, Sumo Logic, XSIAM, etc. Experience with AI in automating security processes. Bachelor’s in Computer Science, or equivalent work experience. Benefits Roku is committed to offering a diverse range More ❯
Experience with agile methodologies and associated tools (i.e. Jira) Experience or knowledge of signal processing systems Experience with Cloud technologies: commercial cloud platforms (AWS), docker Kubernetes, large-scale logging (Loki) and metric tracking (Grafana) Knowledge of cloud-native software construction approaches Software architecture background: building and deploying microservices in the cloud that provide authentication and are adaptable to scaling More ❯
monthly worldwide, requiring robust, scalable infrastructure solutions. You will be responsible for developing scalable solutions that handle billions of monthly requests, building and optimizing monitoring systems with Grafana and Loki, maximizing system uptime, and implementing redundancy across all architectural layers. Daily tasks involve configuring and maintaining servers, networks and applications (Kubernetes, Linux, Docker, S3 & Trino), proactively resolving infrastructure bottlenecks … product owners. Requirements: 4+ years Linux experience Experience building CI/CD with Github/Argo CD or similar tools Kubernetes Experience Experience with monitoring tools like Sentry, Grafana, Loki, Prometheus Salary up to €6,500 Hybrid working, one day in the office in Heerenveen. More ❯
ELK stack to our new, more scalable platform that helps engineers easily track and resolve issues, driving faster recovery times and reducing friction. They include Prometheus, Mimir, Grafana , Honeycomb , Loki , and Quickwit . Collaborate closely with engineering teams to level up their observability, and make the most of the platform we offer with a long-term, sustainable focus. Help … to share knowledge and ensure the team is aligned. Experience with Observability Tools : Hands-on experience with monitoring, logging, and alerting systems such as Prometheus , Mimir , Grafana , ELK , Honeycomb , Loki , Quickwit or similar. Cloud Infrastructure & Orchestration : Extensive experience working with AWS , Terraform , Kubernetes , or other cloud platforms and container orchestration tools. It would be cool if you have: Experience More ❯
Herndon, Virginia, United States Hybrid / WFH Options
TalentRemedy
with Keycloak, Okta or other OIDC/SAML-based SSO & Auth services Experience building and updating secure Docker images to deploy custom applications Familiarity Metrics & Monitoring tooling (Grafana, Mimir, Loki, Tempo, or similar) Experience with Tableau and Snowflake Base Salary Range : $170,000 - $200,000 annually plus 25% annual bonus More ❯
something you are used to do Hands-on experience on large enterprise environments (Setup, coordination, deployment, operations & troubleshooting) You are experienced in monitoring, logging, and alerting tools (Grafana, Prometheus, Loki, Splunk) You are used to work on Azure DevOps You are team player with good communication skills that enjoys big responsibilities and a fast-paced environment You are curious More ❯
London, England, United Kingdom Hybrid / WFH Options
Grafana Labs
experience with o11y. Best practices for both using and deploying an o11y stack (visualization, alerting, metrics, logs, traces) - preferably with Grafana products (Grafana, Prometheus/Cortex/Enterprise Metrics, Loki/Enterprise Logs, Jaeger/Tempo). A proven track record of successful delivery of customer projects, preferably enterprise o11y implementations for large customers. Self-starter, adept at picking More ❯
the Grafana LGTM Stack, which can be run fully managed with Grafana Cloud or self-managed with the Grafana Enterprise Stack, both featuring scalable metrics (Grafana Mimir), logs (Grafana Loki), and traces (Grafana Tempo). Benefits: For more information about the perks and benefits of working at Grafana, please check out our careers page. Equal Opportunity Employer: At Grafana More ❯
Social network you want to login/join with: Are you passionate about creating well-designed solutions in a collaborative and nurturing environment? Do you thrive when your ideas are valued and you can contribute to a team's success More ❯
minimising resolution times and turnaround of code-fixes. Job Duties • Prioritise and provide advanced troubleshooting of incidents escalated via ServiceDesk across a range of technologies: Internal software, MySQL, Instana, Loki, RabbitMQ, Linux & Windows OS, Splunk, Prometheus, Grafana. • Develop clear and concise internal troubleshooting documentation to streamline incident resolution, ensuring each guide includes step-by-step instructions, common error scenarios …/Service or recent relevant qualification. • Previous experience and/or understanding of Windows & Linux OS. • Experience with one or a number of the following monitoring tools: Instana, Splunk, Loki, Prometheus, Grafana. • Experience with Database technologies such as Mysql, MongoDb or Redis and the relevant query language. • Previous experience and/or understanding of cloud-based infrastructure (ideally AWS More ❯
minimising resolution times and turnaround of code-fixes. Job Duties Prioritise and provide advanced troubleshooting of incidents escalated via ServiceDesk across a range of technologies: Internal software, MySQL, Instana, Loki, RabbitMQ, Linux & Windows OS, Splunk, Prometheus, Grafana. Develop clear and concise internal troubleshooting documentation to streamline incident resolution, ensuring each guide includes step-by-step instructions, common error scenarios …/Service or recent relevant qualification. Previous experience and/or understanding of Windows & Linux OS. Experience with one or a number of the following monitoring tools: Instana, Splunk, Loki, Prometheus, Grafana. Experience with Database technologies such as Mysql, MongoDb or Redis and the relevant query language. Previous experience and/or understanding of cloud-based infrastructure (ideally AWS More ❯
London, England, United Kingdom Hybrid / WFH Options
Northern Data Group
player, especially in remote environments. Ability to understand complex/distributed environments, troubleshoot, and resolve issues systematically. Knowledge of Kubernetes (optional). Experience with modern ML stacks (Prometheus, Grafana, Loki, Vector, Opsgenie). Knowledge of DPUs (a plus). Python programming skills (a plus). PostgreSQL optimization skills (a plus). WHAT WE OFFER Shape the future of HPC More ❯
and recovery with improvements to observability and alerting. We place a strong focus on observability, continually evolving our monitoring and alerting stack, currently centred around the Mimir ( Prometheus ), Grafana , Loki , Tempo ecosystem. Our service mesh ( Linkerd ) provides uniform observability of all production services at 10s intervals. Performance and scalability are integral to our software and infrastructure development process, achieved More ❯
of a problem in a complex/distributed environment, debug and solve it in a structured manner. Knowledge of Kubernetes is optional. Knowledge of modern MLA stacks (Prometheus, Grafana, Loki, Vector, Opsgenie). Knowledge of DPUs is a plus. Python programming skills are a plus. Postgres optimization skills are a plus. WHAT WE OFFER With us, you will work More ❯
time to discovery and recovery through enhanced observability and alerting. We focus heavily on observability, continuously evolving our monitoring and alerting stack centered around the Mimir ecosystem (Prometheus, Grafana, Loki, Tempo). Our service mesh (Linkerd) provides uniform observability of all production services at 10-second intervals. Performance and scalability are fundamental to our development process, achieved by combining More ❯
London, England, United Kingdom Hybrid / WFH Options
Proton
Linux Background in performance tuning and problem diagnosis at the OS, database and application levels Experience with software management configuration tools (eg. Puppet) and observability solutions (eg. Prometheus, Grafana, Loki, PMM, Icinga) Basic knowledge of networking, server hardware, storage and operating systems knowledge relevant for databases Willingness and ability to learn quickly Readiness to share knowledge and collaborate with More ❯
CD processes and pipelines to help accelerate software delivery. Experience working with on-prem and disconnected environments. Hands on experience with Kubernetes, Ansible, Vault, Jenkins, GitLab and Grafana/Loki stack. Experience working in Agile environments with Business Analysts, Scrum masters, and sprint cycles leveraging tools like Jira or Rally. Ability to effectively communicate with different level stakeholders (both More ❯
London, England, United Kingdom Hybrid / WFH Options
Grafana Labs
with the Grafana LGTM Stack, which can be run fully managed with Grafana Cloud or self-managed with theGrafana Enterprise Stack , both featuring scalable metrics (Grafana Mimir ), logs (Grafana Loki ), and traces (Grafana Tempo ). Benefits: For more information about the perks and benefits of working at Grafana, please check out ourcareers page . About Grafana Labs: There are … with the Grafana LGTM Stack, which can be run fully managed with Grafana Cloud or self-managed with theGrafana Enterprise Stack , both featuring scalable metrics (Grafana Mimir ), logs (Grafana Loki ), and traces (Grafana Tempo ). Benefits: For more information about the perks and benefits of working at Grafana, please check out ourcareers page . Equal Opportunity Employer: At Grafana More ❯