RabbitMQ, Kafka). Strong grasp of telemetry, observability, and performance monitoring in distributed systems. Track record of technical leadership and setting engineering standards. Nice to Have: Experience with OpenTelemetry , Prometheus, Grafana, or similar observability tooling. Exposure to hybrid-cloud or cloud migration strategies. Familiarity with performance optimisation in low-latency data pipelines. Contributions to DevOps-related communities, blogs, open source More ❯
troubleshooting experience. Working knowledge of HPC container runtimes (e.g., Singularity, Apptainer). Exposure to provisioning and automation tools (e.g., Ansible, PXE, Terraform). Experience with monitoring tools such as Prometheus, Grafana, and DCGM. Understanding of GPU/accelerator toolchains like CUDA or ROCm. A proactive, customer-first mindset with strong communication skills. Ability to work effectively in both individual and More ❯
/Linux fundamentals. Curiosity and the confidence to ask questions in a fast-moving team. Nice-to-haves Exposure to Kubernetes, Docker or Terraform. Experience with observability stacks (Grafana, Prometheus, OpenTelemetry). Familiarity with Postgres. Interest in data-privacy, AdTech/MarTech or large-scale data processing. Familiarity with Kafka, gRPC or Apache Spark. As well as working as part More ❯
a technical setting (preferably SaaS). Customer support experience ideally in the monitoring, observability, or data pipeline space. Experience with Kubernetes, Terraform, and significant consideration if you also have Prometheus experience. Technical understanding and experience with: Coding/SDLC, Linux, Cloud providers (AWS, GCP, Azure), Networking, Shell Strong communication skills both written and verbal. Strong technical, analytic and problem solving More ❯
by several microservices, also written in Python, utilising frameworks and libraries such as Celery, Eventlet, SQLAlchemy, etc. Additionally, GOV.UK Notify utilises AWS RDS (Postgres), AWS SQS, AWS ElastiCache, OpenTelemetry, Prometheus, Grafana and other related services. Concourse CI and Terraform are used to run build-pipelines and manage our infrastructure. For the frontend, we follow theGOV.UK Design System , making use of More ❯
by several microservices, also written in Python, utilising frameworks and libraries such as Celery, Eventlet, SQLAlchemy, etc. Additionally, GOV.UK Notify utilises AWS RDS (Postgres), AWS SQS, AWS ElastiCache, OpenTelemetry, Prometheus, Grafana and other related services. Concourse CI and Terraform are used to run build-pipelines and manage our infrastructure. For the frontend, we follow theGOV.UK Design System , making use of More ❯
end: Java, Python, Spring Boot Database: MongoDB, PL/SQL,NOSQL API Development: RESTful APIs Version Control: Git CI/CD: TeamCity Docker and Containerization Monitoring and Logging (e.g., Prometheus, Grafana, ELK Stack) Security and Compliance Code Quality Tools (e.g., SonarQube) Agile Methodologies (Scrum or Kanban) Soft Skills: Team Collaboration : Ability to work effectively with cross-functional teams, sharing knowledge More ❯
Actions) Solid AWS experience and proficiency in at least one programming language (we use Go) Comfortable designing, operating and troubleshooting production platforms at scale Strong command of observability tooling (Prometheus, Splunk or similar); eager to master Honeycomb Developer empathy & outstanding communication skills; thrive on coaching and cross team collaboration Track record of data driven decision making and continuous improvement Familiarity More ❯
load (JMeter/Gatling/wrk2 etc) and JVM profiling to identify and fix performance bottlenecks Hands-on experience with instrumentation and analysis of production metrics using tools like Prometheus, Grafana, InfluxDB, or the ELK stack to identify performance bottlenecks and ensure system health. As an industry pioneer, our work is constantly evolving and challenging us in new ways that More ❯
CD environment. Demonstrate experience with the games industry, TeamCity, and Perforce Helix Core. Proficiency in Kotlin, Ansible, Bash, Python, HCL, PowerShell. Monitor CI/CD effectiveness using tools like Prometheus and Grafana. Apply problem-solving, troubleshooting, and critical thinking skills. Understand VMWare, Windows, Linux, and MacOS server platforms. Experience with Unity build processes and Apple developer tools. Knowledge of cloud More ❯
web applications Familiarity with infrastructure-as-code tools such as Terraform Understanding of security best practices in web infrastructure and application delivery Exposure to observability tooling and techniques (e.g., Prometheus, Grafana, structured logging) Confident in debugging and resolving issues in complex distributed or web-based Systems A product mindset and collaborative approach to improving how teams build and run software More ❯
times a week. Experience with Agile and/or DevOps methodologies. Good understanding of Linux operating systems, particularly Ubuntu and Redhat. Exposure to OSS monitoring systems (e.g., Nagios, Observium, Prometheus). Scripting and automation experience using tools such as Netbox, Ansible, Puppet, Bash, Python, GIT. Benefits include 25 days of holiday, bonus, pension contribution, private medical, dental, and vision coverage More ❯
this role is for you. Ideally you have several years experience using Go in production. You'll be comfortable with Docker, and familiar with modern observability tools such as Prometheus, Alert Manager, Grafana and X-Ray/Tempo/Jaeger. We're looking for 3+ years tackling hard backend problems Seasoned database experience - we use MySQL, DynamoDB, Elasticsearch and Redis More ❯
this role is for you. Ideally you have several years experience using Go in production. You'll be comfortable with Docker, and familiar with modern observability tools such as Prometheus, Alert Manager, Grafana and X-Ray/Tempo/Jaeger. We're looking for 3+ years tackling hard backend problems Seasoned database experience - we use MySQL, DynamoDB, Elasticsearch and Redis More ❯
following a bonus: Java experience Python experience Ruby experience Big data technologies: Spark, Trino, Kafka Financial Markets experience SQL: Postgres, Oracle Cloud-native deployments: AWS, Docker, Kubernetes Observability: Splunk, Prometheus, Grafana For more information about DRW's processing activities and our use of job applicants' data, please view our Privacy Notice at . California residents, please review the California Privacy More ❯
technical challenges at scale then this role is for you. Minimum Requirements Several years experience using Go in production. Comfortable with Docker. Familiar with modern observability tools such as Prometheus, Alert Manager, Grafana and X-Ray/Tempo/Jaeger. Seasoned database experience - we use MySQL, DynamoDB, Elasticsearch and Redis. Experience with microservices and distributed systems. Used to developing complex More ❯
Nottingham, Nottinghamshire, United Kingdom Hybrid / WFH Options
Nexgencloud
documentation. Nice to Have: Programming & Scripting: Basic Bash scripting, Python, or Golang knowledge. Familiarity with Typescript (Next.js, Tailwind frameworks). Tool Experience: Knowledge of monitoring tools and ELK stack (Prometheus, Elasticsearch). Experience with nova hypervisor, Postman, Rundeck, or Netbox. Industry Knowledge: Exposure to virtualization technologies and their impact on hardware performance. What We Offer: A competitive salary and comprehensive More ❯
Nottingham, Nottinghamshire, United Kingdom Hybrid / WFH Options
Nexgencloud
Familiarity with Typescript (Next.js, Tailwind frameworks). Entry-level experience with OpenStack and Kubernetes management is a nice to have. Tool Experience: Knowledge of monitoring tools and ELK stack (Prometheus, Elasticsearch). Experience with Nova hypervisor, Postman, Rundeck, or Netbox. Industry Knowledge: Exposure to virtualization technologies and their impact on hardware performance. What We Offer: A competitive salary and comprehensive More ❯
Leeds, Yorkshire, United Kingdom Hybrid / WFH Options
William Hill PLC
support our ethos. To apply to this post, you will have: A base in Leeds with working experience of an incident response model and fluency with observability and monitoring (Prometheus, Grafana) Experience defining alerts and implementing dashboards from existing monitoring and logging data Relentless focus on customer experience with good understanding of security best practice Fluency in cloud infrastructure (AWS More ❯
following a bonus: Java experience Python experience Ruby experience Big data technologies: Spark, Trino, Kafka Financial Markets experience SQL: Postgres, Oracle Cloud-native deployments: AWS, Docker, Kubernetes Observability: Splunk, Prometheus, Grafana For more information about DRW's processing activities and our use of job applicants' data, please view our Privacy Notice at . California residents, please review the California Privacy More ❯
ability to work effectively across teams and with external stakeholders. Technologies We Use Golang AWS, CDK (TypeScript), Lambda, SQS, EventBridge, RDS, DynamoDB, OpenSearch GitHub, GitHub Actions Loki, Tempo, Grafana, Prometheus Event-driven architecture and domain-driven design How we reward our team Dynamic working environment with a diverse and driven team Huge opportunity for learning in a high growth environment More ❯
Desktop Strong proficiency in Bash, Powershell and Ansible scripting, Python experience is desirable Expertise in virtualisation platforms and container orchestration and related tooling. Familiarity with monitoring and observability stacks (Prometheus, Grafana, ELK/EFK, or equivalents). Ability to diagnose and resolve complex technical issues with a clear methodical approach Ability to manage multiple tasks and prioritise effectively Is highly More ❯
other infrastructure-as-code tools (e.g., Terraform). Ensure secure, stable environments through proper VPC design, IAM governance, and secret management. Build and maintain system metrics and alerts using Prometheus, Grafana, and Loki. Enforce GitHub repo and branching standards across development teams. Ensure cost-effective infrastructure usage through continuous monitoring, resource optimization, and cost control strategies across AWS and containerized … is a bonus. Solid scripting skills (Python, Bash, or equivalent). Hands-on experience with Docker, Kubernetes, Helm, and deployment automation. Familiar with monitoring and logging stacks; experience with Prometheus/Grafana is expected. Security-conscious and experienced in IAM, encryption, and secure system design. Able to monitor and optimize computing resources to maintain performance within budget constraints. Comfortable using More ❯
in their Hertfordshire office. In this role, you'll take ownership of the end-to-end monitoring and alerting stack, designing and maintaining infrastructure and alert configurations (e.g., with Prometheus/Grafana or equivalent), and building dashboards that clearly communicate metrics to business stakeholders. You'll drive system automation and integration, crafting scripts and workflows-primarily in Python-to onboard More ❯
MSc in Networking Experience with PTP/PPS platforms (Meinberg, FSMLabs, ADVA/Oscilloscope) Proficiency in using commercial NMS tools (e.g. Zabbix, Solarwinds, Nagios) and open-source tools (e.g. Prometheus, Alertmanager, Grafana) Basic programming skills in Python or GoLang Knowledge of Infrastructure as Code (IaC) tools such as Ansible or Terraform Familiarity with network analysis tools such as Wireshark, Splunk More ❯