alerts, and service flow mappings aligned to engineering needs. Help teams craft complex DQL queries to extract meaningful insights from telemetry data. Support observability design and migration efforts from Prometheus, Grafana, and CloudWatch to Dynatrace. Advise on RBAC models and data access strategies based on team structure and security requirements. Assist in monitoring strategy for Kubernetes-based workloads, especially in More ❯
Cambridge, Cambridgeshire, East Anglia, United Kingdom
Xcede
alerts, and service flow mappings aligned to engineering needs. Help teams craft complex DQL queries to extract meaningful insights from telemetry data. Support observability design and migration efforts from Prometheus, Grafana, and CloudWatch to Dynatrace. Advise on RBAC models and data access strategies based on team structure and security requirements. Assist in monitoring strategy for Kubernetes-based workloads, especially in More ❯
Deep knowledge of Kubernetes, containerized infrastructure, cloud platforms (e.g. GCP) Database expertise : Production experience with OSS datastores (PostgreSQL, Redis, Kafka) Observability mastery : Hands-on experience with observability stacks (Datadog, Prometheus/Grafana, OpenTelemetry or similar) Programming proficiency : Strong hands-on software engineering skills (Python, Go, Rust) Operational mindset : "You build it, you run it, you own it" philosophy with the More ❯
Manchester, North West, United Kingdom Hybrid / WFH Options
Hays
Strong understanding of networking, virtualisation, and cloud security principles. Operate, maintain, and enhance the Azure Virtual Desktop (AVD) environment. Experience with monitoring and logging tools (e.g., Azure Monitor, CloudWatch, Prometheus). Expert in setting up and managing host pools, session hosts, user access, application layers, and FSLogix profiles. Strong knowledge of cloud architecture, design, and implementation principles and practices. Proficiency More ❯
Manchester, Lancashire, England, United Kingdom Hybrid / WFH Options
Hays Specialist Recruitment Limited
Strong understanding of networking, virtualisation, and cloud security principles. Operate, maintain, and enhance the Azure Virtual Desktop (AVD) environment. Experience with monitoring and logging tools (e.g., Azure Monitor, CloudWatch, Prometheus). Expert in setting up and managing host pools, session hosts, user access, application layers, and FSLogix profiles. Strong knowledge of cloud architecture, design, and implementation principles and practices. Proficiency More ❯
stack: Languages: TypeScript, Javascript Libraries and frameworks: gRPC, Redux, React Native, React, Next.js Datastores: Vitess, MySQL, CockroachDB, BigQuery, Redis Infrastructure: Google Cloud Platform, Kubernetes, Docker, PubSub, Terraform Monitoring: Grafana, Prometheus, Sentry, Metabase About you: You are a frontend developer with at least 5 years' experience You are fast and love to deliver incredible code You can reduce complex problems to More ❯
stack: Languages: TypeScript, Javascript Libraries and frameworks: gRPC, Redux, React Native, React, Next.js Datastores: Vitess, MySQL, CockroachDB, BigQuery, Redis Infrastructure: Google Cloud Platform, Kubernetes, Docker, PubSub, Terraform Monitoring: Grafana, Prometheus, Sentry, Metabase About you: You are a frontend developer with at least 2 years' experience You are fast and love to deliver incredible code You can reduce complex problems to More ❯
engineering, ideally in distributed, real-time systems Experience with containerisation and orchestration technologies, such as Kubernetes, in production environments Familiarity with observability tooling and practices, such as Victoria Metrics, Prometheus, Grafana, OpenTelemetry and SLOs Well-developed debugging skills with the ability to navigate unfamiliar systems, identify root causes and deliver effective solutions under time pressure Proven track record of contributing More ❯
of UNIX, Linux, networking (TCP/IP), and databases (both relational and NoSQL). Experience in android and iOS application debugging. Experience with observability tools such as Grafana and Prometheus, and skills in documenting procedures for knowledge management. Strong interpersonal and communication skills to thrive in fast-paced, dynamic environments. NOTE: As part of the operation staff members of the More ❯
Bradford, Yorkshire, United Kingdom Hybrid / WFH Options
Yorkshire Building Society Group
in the following: Continuous Integration/Continuous Delivery pipelines - tools such as Jenkins & GitLab Scripting and automation capabilities Modern monitoring skills and best practices using tools such as Grafana, Prometheus, Kibana, DynaTrace Testing frameworks Knowledge of networks and routing. Knowledge of integrations of services utilising different technologies such as PLSQL, .Net, C#, Java, Sprint Boot, Spring Batch Experience of integrating More ❯
highly technical, ambiguous domains. Strong knowledge of REST APIs , distributed system design, and performance optimization. Experience with both SQL and NoSQL data stores , caching layers, and observability tooling (e.g., Prometheus, Datadog). Nice to have: Experience deploying or integrating LLMs or NLP models in production systems. Comfortable balancing short-term execution with long-term architectural thinking . Passion for building More ❯
native applications Working in a Continuous Delivery environment Modern observability practices Nice to have Not vital, but you'll have the edge if you also have experience with: Grafana Prometheus Kotlin or a least the willingness to learn it Batch processing data pipelines or have worked in: an eCommerce organisation a shipping/logistics/exports organisation What you bring More ❯
availability and security. Automation & CI/CD: Implement and manage CI/CD pipelines for efficient deployment, testing, and monitoring of applications. Observability & Monitoring: Develop comprehensive monitoring solutions using Prometheus, Grafana, ELK stack, or similar tools to improve system reliability. Security & Compliance: Apply best practices for cloud security, IAM policies, and compliance frameworks (SOC2, ISO 27001, etc.). Incident Response … clusters). Proficiency in scripting and automation using Python, Bash, or Go. Experience with Infrastructure as Code (Terraform, CloudFormation, or Ansible). Familiarity with monitoring, logging, and observability tools (Prometheus, Grafana, Datadog, ELK, etc.). Strong understanding of networking concepts (VPC, Load Balancers, DNS, Firewalls). Experience with DevOps methodologies, CI/CD pipelines, and GitOps practices. Experience with high More ❯
in the development lifecycle. Observability & Reliability (SRE) Lead the charge on improving our observability strategy. Design and implement a robust monitoring, logging, and alerting framework using tools like Grafana, Prometheus, and native AWS services. Enhance our incident response processes, contribute to on-call rotations, and foster a culture of blameless post-mortems. Security & Governance Drive infrastructure security best practices across … ability to mentor and collaborate with other engineers. Technical Skills: Cloud: AWS (EKS, RDS, Lambda, etc.) IaC: Terraform (Expert) Containerisation: Kubernetes, Docker CI/CD: GitHub Actions Observability: Grafana, Prometheus, AWS CloudWatch, OpenTelemetry/distributed tracing. Scripting: Strong proficiency in at least one scripting language (e.g., Python, Go, Bash). Familiarity with JavaScript/TypeScript is a plus, as it More ❯
React on the Frontend. Tech & Data Science stack: Kubernetes & Docker on Google Cloud Python 3: Pandas, RabbitMQ, Celery, Flask, SciPy, NumPy, Dash, Plotly, Matplotlib Javascript, React, Redux PostgreSQL, Redis Prometheus, Alert Manager, DataDog If you joined the company in a Data Science role you would be working on sophisticated pricing algorithms which would enable companies in the entertainment industry to More ❯
learn new tech quickly Experience mentoring junior engineers Experience interacting with multiple stakeholders Enjoyable to work with TECHNOLOGY STACK Python, PostgreSQL, FastAPI, Redis, TypeScript, React, Next.js, Tailwind, AWS, Kubernetes, Prometheus, Pinecone, GPT-4 EXAMPLE PROJECTS Craft plan to measure and improve our search engine Improve and migrate our data model for the content we host Migrate our NLP algorithms over More ❯
Site Reliability Engineering function they're building from scratch. They talked about production infrastructure, optimisation, automation and focusing on the deployment process rather than the build. We discussed Kubernetes, Prometheus and API Gateways. Most importantly, they spoke like they knew what the hell they were on about. Not just about SRE, but on the whole Engineering process. This is a More ❯
Expertise required for this engagement: Guide operational practices across services built using Java (Spring Boot) , Kafka , MongoDB and related technologies. Oversee monitoring, observability, and performance tuning using Datadog , ELK , Prometheus , or similar tooling. Problem Management & Root Cause Elimination required: Lead proactive and reactive problem management efforts. Identify recurring production issues and collaborate with engineering to design permanent solutions. Track and More ❯
are ever the same. Essential Skills Solid Unix/Linux skills Experience with Bash, SQL, PHP Comfortable with Apache/Nginx, load balancers (HAProxy), and monitoring tools (Nagios, Grafana, Prometheus) Knowledge of log management (Graylog, Elasticsearch) Familiar with Ansible and Gitlab CI/CD Experience using Git/SVN What Sets You Apart Passionate self-starter who loves problem-solving More ❯
Manchester, Lancashire, England, United Kingdom Hybrid / WFH Options
DCS Recruitment
are ever the same. Essential Skills Solid Unix/Linux skills Experience with Bash, SQL, PHP Comfortable with Apache/Nginx, load balancers (HAProxy), and monitoring tools (Nagios, Grafana, Prometheus) Knowledge of log management (Graylog, Elasticsearch) Familiar with Ansible and Gitlab CI/CD Experience using Git/SVN What Sets You Apart Passionate self-starter who loves problem-solving More ❯
RabbitMQ Azure DevOps. We are big fans of Azure Pipelines! Some of our services are migrating away from TeamCity and Octopus Deploy Our observability stack is Splunk, Grafana and Prometheus You As a software engineer, you will be: Part of a cross-functional team working with Product Managers, Testers and DevOps engineers Writing well-tested and maintainable code Getting involved More ❯
to work effectively with internal teams and customer-facing stakeholders. Technologies we use Golang AWS, CDK (TypeScript), Lambda, SQS, EventBridge, RDS, DynamoDB, OpenSearch Github, Github Actions Loki, Tempo, Grafana, Prometheus Event-driven architecture and domain-driven design How we reward our team Dynamic working environment with a diverse and driven team Huge opportunity for learning in a high growth environment More ❯
reporting. Develop and implement TOC strategy, staffing models, and documentation standards. Participate in systems architecture, new tech evaluation, and vendor selection. Manage operational workflows, reporting systems (e.g., Zabbix, Grafana, Prometheus), and support international broadcast teams. Collaborate with leadership on technical direction and TOC transformation. Skills/Must Have: 5-7+ years in a technical leadership role within a TOC More ❯
Real Time data, designing systems that can elastically scale to handle surges in throughput and demand. Hands-on experience with modern technologies such as Kubernetes, Kafka, RocksDB, MongoDB, MemSQL, Prometheus, Tempo, and Snowflake is highly desirable. Exposure to cloud-native tooling and practices, with an emphasis on DevOps, cloud computing, Kubernetes, and stream processing is a strong advantage. Comfortable working More ❯