operations, directly influencing architectural decisions. What You’ll Bring 8+ years in production engineering or SRE roles. Deep Java/Spring experience. Expertise in monitoring, alerting, and incident tooling (Prometheus, Grafana, OpenTelemetry, ELK, etc.). Experience with Azure, Kubernetes, and scalable systems in high-uptime environments (fintech/crypto preferred). If you're a seasoned engineer who loves clean More ❯
roles. Strong experience with Java (Spring) and cloud platforms (ideally Azure ). Proven track record in building and maintaining mission-critical systems. Deep understanding of Kubernetes, observability tooling (Grafana, Prometheus, ELK, etc.), and Infrastructure as Code (Terraform, Bicep). Ability to lead technical conversations across Engineering and Product. Bonus points if you bring: Experience in fintech, crypto, or regulated digital More ❯
roles. Strong experience with Java (Spring) and cloud platforms (ideally Azure ). Proven track record in building and maintaining mission-critical systems. Deep understanding of Kubernetes, observability tooling (Grafana, Prometheus, ELK, etc.), and Infrastructure as Code (Terraform, Bicep). Ability to lead technical conversations across Engineering and Product. Bonus points if you bring: Experience in fintech, crypto, or regulated digital More ❯
times a week. Experience with Agile and/or DevOps methodologies. Good understanding of Linux operating systems, particularly Ubuntu and Redhat. Exposure to OSS monitoring systems (e.g., Nagios, Observium, Prometheus). Scripting and automation experience using tools such as Netbox, Ansible, Puppet, Bash, Python, GIT. Benefits include 25 days of holiday, bonus, pension contribution, private medical, dental, and vision coverage More ❯
based software applications, based on technologies such as node.js, PostgreSQL or Elasticsearch Knowledge of modern infrastructure and operational tooling within cloud-based architectures, such as Linux, python, AWS, ansible, Prometheus Qualifications Bachelor's or Master's degree with a First or 2:1, preferably in a technical subject If you are excited about this role but your skill or experience More ❯
this role is for you. Ideally you have several years experience using Go in production. You'll be comfortable with Docker, and familiar with modern observability tools such as Prometheus, Alert Manager, Grafana and X-Ray/Tempo/Jaeger. We're looking for 3+ years tackling hard backend problems Seasoned database experience - we use MySQL, DynamoDB, Elasticsearch and Redis More ❯
this role is for you. Ideally you have several years experience using Go in production. You'll be comfortable with Docker, and familiar with modern observability tools such as Prometheus, Alert Manager, Grafana and X-Ray/Tempo/Jaeger. We're looking for 3+ years tackling hard backend problems Seasoned database experience - we use MySQL, DynamoDB, Elasticsearch and Redis More ❯
and participate in sensible, scalable, systems design and share responsibility with them in diagnosing, resolving, and preventing production issues. What We Value Experience with monitoring systems using tools like Prometheus and writing health checks Interest in learning and managing technologies like Spark, Hadoop, Elasticsearch, and Cassandra Familiarity with deploying GPUs Moderate experience with TCP/IP networking Ability to work More ❯
skills in architecture design Expertise in one of the programming languages and paradigms - our systems are written in TypeScript, Java, Golang, Rust, Python and others Desirable Experience with Kubernetes, Prometheus, Terraform, NoSQL or GCP Perks of joining us: Company pension contributions at 5%. Individualised training budget for you to learn on the job and level yourself up. Discounted holidays More ❯
AWS). Desirable skills: Automating network configurations and deployments using infrastructure-as-code (IaC) tools e.g. Ansible, Terraform, or Python scripts. Monitoring and logging network performance using tools like Prometheus, Grafana, or ELK stack. Experience with developing and maintaining air gapped networks. Experience with Voice over IP (VoIP) technologies including SIP, RTP protocols, and implementation/management of Cisco Unified More ❯
AWS). Desirable skills: Automating network configurations and deployments using infrastructure-as-code (IaC) tools e.g. Ansible, Terraform, or Python scripts. Monitoring and logging network performance using tools like Prometheus, Grafana, or ELK stack. Experience with developing and maintaining air gapped networks. Experience with Voice over IP (VoIP) technologies including SIP, RTP protocols, and implementation/management of Cisco Unified More ❯
ISIS, BGP, BMP, ARP, SNMP, CDP/LLDP) and network engineering, management, and operations. Experience with search and analytics engines/big data tools (OpenSearch, Kafka, Kibana, Telegraf, InfluxDB, Prometheus). Our Preferred Qualifications for this role: Basic understanding of AI and ML algorithms, including model training, testing, and deployment. Hands-on project experience in network automation; experience with AWX More ❯
Leeds, Yorkshire, United Kingdom Hybrid / WFH Options
William Hill PLC
support our ethos. To apply to this post, you will have: A base in Leeds with working experience of an incident response model and fluency with observability and monitoring (Prometheus, Grafana) Experience defining alerts and implementing dashboards from existing monitoring and logging data Relentless focus on customer experience with good understanding of security best practice Fluency in cloud infrastructure (AWS More ❯
Out in Science, Technology, Engineering, and Mathematics
highly technical, ambiguous domains. Strong knowledge of REST APIs , distributed system design, and performance optimization. Experience with both SQL and NoSQL data stores , caching layers, and observability tooling (e.g., Prometheus, Datadog). Nice to have: Experience deploying or integrating LLMs or NLP models in production systems. Comfortable balancing short-term execution with long-term architectural thinking . Passion for building More ❯
suit a software engineer who cares about clean, testable code and good software practices, but prefers working in the infra/tooling space. What you’ll be doing: Writing Prometheus exporters and integrations for infrastructure systems Building out dashboards and monitoring pipelines in Grafana and Prometheus Developing infrastructure-as-code tooling (Terraform, Ansible) Designing well-structured, testable software that improves … system visibility What they’re looking for: Strong software engineering skills (Go or Python preferred) Experience working in or alongside platform engineering teams Familiarity with modern observability tools (Grafana, Prometheus, etc.) Comfort working across both code and infrastructure – but this is not a pure ops/SRE role If you've worked in finance that would be great but not More ❯
suit a software engineer who cares about clean, testable code and good software practices, but prefers working in the infra/tooling space. What you’ll be doing: Writing Prometheus exporters and integrations for infrastructure systems Building out dashboards and monitoring pipelines in Grafana and Prometheus Developing infrastructure-as-code tooling (Terraform, Ansible) Designing well-structured, testable software that improves … system visibility What they’re looking for: Strong software engineering skills (Go or Python preferred) Experience working in or alongside platform engineering teams Familiarity with modern observability tools (Grafana, Prometheus, etc.) Comfort working across both code and infrastructure – but this is not a pure ops/SRE role If you've worked in finance that would be great but not More ❯
suit a software engineer who cares about clean, testable code and good software practices, but prefers working in the infra/tooling space. What you’ll be doing: Writing Prometheus exporters and integrations for infrastructure systems Building out dashboards and monitoring pipelines in Grafana and Prometheus Developing infrastructure-as-code tooling (Terraform, Ansible) Designing well-structured, testable software that improves … system visibility What they’re looking for: Strong software engineering skills (Go or Python preferred) Experience working in or alongside platform engineering teams Familiarity with modern observability tools (Grafana, Prometheus, etc.) Comfort working across both code and infrastructure – but this is not a pure ops/SRE role If you've worked in finance that would be great but not More ❯
code-fixes. Job Duties • Prioritise and provide advanced troubleshooting of incidents escalated via ServiceDesk across a range of technologies: Internal software, MySQL, Instana, Loki, RabbitMQ, Linux & Windows OS, Splunk, Prometheus, Grafana. • Develop clear and concise internal troubleshooting documentation to streamline incident resolution, ensuring each guide includes step-by-step instructions, common error scenarios, and solutions tailored to our systems and … Service or recent relevant qualification. • Previous experience and/or understanding of Windows & Linux OS. • Experience with one or a number of the following monitoring tools: Instana, Splunk, Loki, Prometheus, Grafana. • Experience with Database technologies such as Mysql, MongoDb or Redis and the relevant query language. • Previous experience and/or understanding of cloud-based infrastructure (ideally AWS). • Operated More ❯
also able to offer 4 day working weeks and part time options can also be considered. Requirements: - Active DV Clearance - Kubernetes - Terraform - Strong knowledge of monitoring tools such as Prometheus or Grafana - Python or other scripting language If you're a DevOps engineer looking for acontract offering £500 - £ 550 A DAY OUTSIDE IR35 , then send an updated CV to The More ❯
System Engineer within financial services Know how to write good code (Go, Python, Bash, etc.). Know how to use virtualization (Docker, KVM, etc.). Familiar with monitoring systems (Prometheus, Grafana, etc.). Know about networking hardware (switches, routers). If this opportunity is of interest, please reach out to Daniel O'Connell directly on LinkedIn or email at daniel.oconnell More ❯
System Engineer within financial services Know how to write good code (Go, Python, Bash, etc.). Know how to use virtualization (Docker, KVM, etc.). Familiar with monitoring systems (Prometheus, Grafana, etc.). Know about networking hardware (switches, routers). If this opportunity is of interest, please reach out to Daniel O'Connell directly on LinkedIn or email at daniel.oconnell More ❯
System Engineer within financial services Know how to write good code (Go, Python, Bash, etc.). Know how to use virtualization (Docker, KVM, etc.). Familiar with monitoring systems (Prometheus, Grafana, etc.). Know about networking hardware (switches, routers). If this opportunity is of interest, please reach out to Daniel O'Connell directly on LinkedIn or email at daniel.oconnell More ❯
and monitoring tools Triaging production issues Performance tuning of JVM apps Nice to have Not vital, but you'll have the edge if you also have experience with: Kotlin Prometheus Query Language (PromQL) Grafana Prometheus or have worked in: an eCommerce organisation a shipping/logistics/exports organisation What you bring Agile: Test-Driven Development, collaboration and continuous delivery More ❯
cloud-native tools and scripting (e.g., Terraform, Ansible, AWS RDS/Aurora tools, Azure SQL automation). Monitoring & Health Checks: Utilize tools such as CloudWatch, Azure Monitor, OEM, or Prometheus to monitor performance and availability. Troubleshooting & Root Cause Analysis: Diagnose and resolve database incidents; conduct RCAs for critical incidents and outages. Collaboration: Work closely with DevOps, Application, and Security teams More ❯
cloud-native tools and scripting (e.g., Terraform, Ansible, AWS RDS/Aurora tools, Azure SQL automation). Monitoring & Health Checks: Utilize tools such as CloudWatch, Azure Monitor, OEM, or Prometheus to monitor performance and availability. Troubleshooting & Root Cause Analysis: Diagnose and resolve database incidents; conduct RCAs for critical incidents and outages. Collaboration: Work closely with DevOps, Application, and Security teams More ❯