security best practices and the ability to implement security controls at the infrastructure level Experience with monitoring and logging tools like DataDog or Grafana's observability stack (Prometheus, Tempo, Loki, Grafana) Familiarity with the open standard OpenTelemetry Excellent written and verbal communication skills, we're a collaborative team! PLEASE NOTE: Our engineering teams work fully remotely across Europe but More ❯
Seattle, Washington, United States Hybrid / WFH Options
Georgia IT Inc
with Chef and Ansible required Strong leadership, initiative taking, and capacity for decision making Expert knowledge in any or all of these is a huge plus: Prometheus Operator, Grafana, Loki, ELK Stack, OpenTelemetry, Jaeger/OpenTracing (and yes, we use ALL of these!) Participate in the on-call rotation for Operations support Bachelor's degree in CS or a More ❯
logging, and metrics that seamlessly track requests across the entire lifecycle, from API Gateway through the runtime engine and sandboxed environments to external APIs, visualized in tools like Grafana, Loki, or Prometheus. Building intuitive self-service tools for internal developers, such as CLI tools, GitHub Actions, and Backstage plugins, enabling them to quickly provision new micro-services or AI More ❯
logging, and metrics that seamlessly track requests across the entire lifecycle, from API Gateway through the runtime engine and sandboxed environments to external APIs, visualized in tools like Grafana, Loki, or Prometheus. Building intuitive self-service tools for internal developers, such as CLI tools, GitHub Actions, and Backstage plugins, enabling them to quickly provision new micro-services or AI More ❯
and environments Understanding of Kubernetes based development tools such as Terraform, Helm, Python, Go, and Bash Experience working with observability and monitoring software like Grafana, Prometheus, Alert Manager, and Loki Experience working with Poolside environments Constant vigilance when investigating and finding root cause of distributed system malfunctions Ability to anticipate or adopt new innovations and advancements in the CNCF More ❯
Evanston, Illinois, United States Hybrid / WFH Options
Northwestern University
Apply for Job Job ID 52172 Location Evanston, Illinois Add to Favorite Jobs Email this Job Department: NAISE - NU ANL Inst Sci Eng Salary/Grade: ITS/82 Job Summary: This will be an SRE role with a focus More ❯
London, UK We believe that we are better together, and at Tripadvisor we welcome you for who you are. Our workplace is for everyone, as is our people powered platform. At Tripadvisor, we want you to bring your unique perspective More ❯
We believe that we are better together, and at Tripadvisor we welcome you for who you are. Our workplace is for everyone, as is our people powered platform. At Tripadvisor, we want you to bring your unique perspective and experiences More ❯
on-premises and cloud-native) Strong Linux (preferably RHEL) systems administration skills Proven experience with HPC workloads or scientific computing clusters Hands-on experience with observability tools: Prometheus, Grafana, Loki Infrastructure as Code (IaC) using Terraform, Ansible CI/CD experience with GitOps tools (e.g., ArgoCD, Flux) Prior experience leading engineering teams in distributed environments More ❯
on-premises and cloud-native) Strong Linux (preferably RHEL) systems administration skills Proven experience with HPC workloads or scientific computing clusters Hands-on experience with observability tools : Prometheus, Grafana, Loki Infrastructure as Code (IaC) using Terraform , Ansible CI/CD experience with GitOps tools (e.g., ArgoCD, Flux) Prior experience leading engineering teams in distributed environments More ❯
on-premises and cloud-native) Strong Linux (preferably RHEL) systems administration skills Proven experience with HPC workloads or scientific computing clusters Hands-on experience with observability tools : Prometheus, Grafana, Loki Infrastructure as Code (IaC) using Terraform , Ansible CI/CD experience with GitOps tools (e.g., ArgoCD, Flux) Prior experience leading engineering teams in distributed environments More ❯
Scottsdale, Arizona, United States Hybrid / WFH Options
Saxon Global
Change Management and Problem Management. 1-2 years of Experience in Infrastructure Support, Configuration and Release Management. 2-3 years of hands on experience with Tools including Splunk, Grafana, Loki, APPDynamics or other APM solutions 2+ years of Experience with Application support built on-prem and native cloud environments Able to code - Java, SQL, PromSQL, Shell and Python. Root More ❯
Experience with agile methodologies and associated tools (i.e. Jira) Experience or knowledge of signal processing systems Experience with Cloud technologies: commercial cloud platforms (AWS), docker Kubernetes, large-scale logging (Loki) and metric tracking (Grafana) Knowledge of cloud-native software construction approaches Software architecture background: building and deploying microservices in the cloud that provide authentication and are adaptable to scaling More ❯
monthly worldwide, requiring robust, scalable infrastructure solutions. You will be responsible for developing scalable solutions that handle billions of monthly requests, building and optimizing monitoring systems with Grafana and Loki, maximizing system uptime, and implementing redundancy across all architectural layers. Daily tasks involve configuring and maintaining servers, networks and applications (Kubernetes, Linux, Docker, S3 & Trino), proactively resolving infrastructure bottlenecks … product owners. Requirements: 4+ years Linux experience Experience building CI/CD with Github/Argo CD or similar tools Kubernetes Experience Experience with monitoring tools like Sentry, Grafana, Loki, Prometheus Salary up to €6,500 Hybrid working, one day in the office in Heerenveen. More ❯
Herndon, Virginia, United States Hybrid / WFH Options
TalentRemedy
with Keycloak, Okta or other OIDC/SAML-based SSO & Auth services Experience building and updating secure Docker images to deploy custom applications Familiarity Metrics & Monitoring tooling (Grafana, Mimir, Loki, Tempo, or similar) Experience with Tableau and Snowflake Base Salary Range : $170,000 - $200,000 annually plus 25% annual bonus More ❯
something you are used to do Hands-on experience on large enterprise environments (Setup, coordination, deployment, operations & troubleshooting) You are experienced in monitoring, logging, and alerting tools (Grafana, Prometheus, Loki, Splunk) You are used to work on Azure DevOps You are team player with good communication skills that enjoys big responsibilities and a fast-paced environment You are curious More ❯
the Grafana LGTM Stack, which can be run fully managed with Grafana Cloud or self-managed with the Grafana Enterprise Stack, both featuring scalable metrics (Grafana Mimir), logs (Grafana Loki), and traces (Grafana Tempo). Benefits: For more information about the perks and benefits of working at Grafana, please check out our careers page. Equal Opportunity Employer: At Grafana More ❯
minimising resolution times and turnaround of code-fixes. Job Duties • Prioritise and provide advanced troubleshooting of incidents escalated via ServiceDesk across a range of technologies: Internal software, MySQL, Instana, Loki, RabbitMQ, Linux & Windows OS, Splunk, Prometheus, Grafana. • Develop clear and concise internal troubleshooting documentation to streamline incident resolution, ensuring each guide includes step-by-step instructions, common error scenarios …/Service or recent relevant qualification. • Previous experience and/or understanding of Windows & Linux OS. • Experience with one or a number of the following monitoring tools: Instana, Splunk, Loki, Prometheus, Grafana. • Experience with Database technologies such as Mysql, MongoDb or Redis and the relevant query language. • Previous experience and/or understanding of cloud-based infrastructure (ideally AWS More ❯
CD processes and pipelines to help accelerate software delivery. Experience working with on-prem and disconnected environments. Hands on experience with Kubernetes, Ansible, Vault, Jenkins, GitLab and Grafana/Loki stack. Experience working in Agile environments with Business Analysts, Scrum masters, and sprint cycles leveraging tools like Jira or Rally. Ability to effectively communicate with different level stakeholders (both More ❯