and optimize CI/CD pipelines using Azure DevOps, GitHub Actions, or Jenkins. Automate everything with Terraform, Bicep, and scripting (PowerShell, Bash, Python). Drive observability with tools like Datadog, LogicMonitor, CloudWatch, and Grafana. Champion cloud security, IAM, RBAC, and compliance best practices. Collaborate across teams, mentor peers, and contribute to a culture of continuous improvement. ?? What You Bring: Proven More ❯
Leeds, West Yorkshire, England, United Kingdom Hybrid / WFH Options
Anson McCade Ltd - IT and Finance Recruitment
pipelines (e.g., GitHub Actions, CircleCI). Bonus Skills That Impress Delivery into cloud platforms (AWS, Azure, GCP). Familiarity with relational and NoSQL databases. Experience with observability tools (e.g., DataDog, Prometheus). Test automation know-how. Exposure to open-source tools and community practices. Ready to build what matters? Apply now to shape the future of digital engineering in an More ❯
Leeds, England, United Kingdom Hybrid / WFH Options
Anson McCade
frameworks Desirable Experience Delivery of secure software in government, defence, or other regulated sectors Hands-on cloud-native development and deployment Knowledge of logging and monitoring tools such as DataDog, Prometheus, or StackDriver Experience working with product lifecycle tooling and engineering in complex domains If you’re looking to focus on real engineering work that drives meaningful outcomes and want More ❯
Leeds, West Yorkshire, England, United Kingdom Hybrid / WFH Options
Anson McCade Ltd - IT and Finance Recruitment
Nice to Have (But Not Essential) Cloud experience: AWS, Azure or GCP Solid grasp of databases and data modelling Familiarity with open-source tools and monitoring platforms (e.g., Prometheus, DataDog) Experience with test automation frameworks and performance tools More ❯
test: Containerisation (e.g. Docker), Virtualisation and Provisioning, Workload and job scheduling (e.g. Kubernetes, Ray) on high core-count machines and rack-scale installations, Management and Observability (e.g. Prometheus, OpenTelemetry, DataDog, Splunk, etc.). 10+ years of relevant experience related to quality assurance/testing teams. Experience with the Atlassian suite and CI/CD platforms such as Jenkins; GitHub or More ❯
team members • Proficient with Infrastructure as Code, (such as CDK, CloudFormation, Puppet, Chef, Ansible, or similar) Proficient with operating services in AWS Experience with monitoring frameworks (such as CloudWatch, Datadog, Grafana, Elastic or similar) • Experience scripting operating system tasks in Bash, Python, etc. • Proficient in operating 24x7 high-availability, distributed software applications Desire to dive deep into, and find opportunities More ❯
Portsmouth, Hampshire, United Kingdom Hybrid / WFH Options
Checkatrade
Senior Platform Engineer Experience in Cloud Native technologies? Come join us! Are you looking for a new role? We have an exciting opportunity at Checkatrade for a Senior Platform Engineer to join our mission of making home improvements easy by More ❯
Cambourne, Cambridgeshire, United Kingdom Hybrid / WFH Options
Remotestar
Job description RemoteStar is looking to hire a Senior Site Reliability Engineering Manager on behalf of our client based in the UK with a fully remote work policy. About Client: The client building, the B2B marketplace for diamonds. It's More ❯
Newcastle Upon Tyne, Tyne And Wear, United Kingdom
Strive Gaming
in on-call rotations and help troubleshoot production issues. Tech Requirements (must have): IAC - Infrastructure as Code (Terraform) AWS Argo Strong linux skills ELK/LGTM stack knowledge Prometheus DataDog Grafana Kubernetes Helm Docker Bash/shell scripting Git Strong security mindset Tech (nice to have) Crowdstrike OnPrem/ESXI Windows Server EntraID More ❯
Google Cloud. Familiarity with database systems, data modelling, and SQL/NoSQL technologies. Comfortable working with a range of open-source tools and frameworks. Experience with observability tools like DataDog, Prometheus, or StackDriver. Knowledge of test automation frameworks and practices. Why This Role Stands Out No sales responsibilities or forced management track—grow deeply in your technical craft. Access to More ❯
Bash, or Python. Use Terraform or ARM templates for provisioning and managing infrastructure. Monitoring & Troubleshooting Monitor database health using Azure Monitor, Log Analytics, and third-party tools (e.g., pgAdmin, Datadog). Troubleshoot issues like deadlocks, slow queries, and connection problems. 5) Collaboration & Support Work with DevOps, development, and infrastructure teams. Provide guidance on database design and performance optimization. Participate in More ❯
Watford, Hertfordshire, United Kingdom Hybrid / WFH Options
Wickes
Service Level Indicators (SLIs), driving initiatives to enhance reliability, performance, and scalability. You will design, implement, and manage observability solutions, including monitoring, logging, and tracing, with strong expertise in Datadog for proactive dashboards and alerts. Automate manual operational tasks to reduce toil and improve system resilience. Collaboration is key both with our Platform Engineers, to ensure we manage and improve More ❯
improvement Familiarity with some of our tech stack: PostgreSQL, or a similar RDBMS, particularly in Amazon RDS at scale Docker and Kubernetes, we use Amazon EKS in production Python Datadog, or a similar logging/monitoring tool Messaging queues, event-driven async processing or similar technologies - we use RabbitMQ Terraform, or a similar infrastructure-as-code tool Experience with a More ❯
improvement Familiarity with some of our tech stack: PostgreSQL, or a similar RDBMS, particularly in Amazon RDS at scale Docker and Kubernetes, we use Amazon EKS in production Python Datadog, or a similar logging/monitoring tool Messaging queues, event-driven async processing or similar technologies - we use RabbitMQ Terraform, or a similar infrastructure-as-code tool Experience with a More ❯
improvement Familiarity with some of our tech stack: PostgreSQL, or a similar RDBMS, particularly in Amazon RDS at scale Docker and Kubernetes, we use Amazon EKS in production Python Datadog, or a similar logging/monitoring tool Messaging queues, event-driven async processing or similar technologies - we use RabbitMQ Terraform, or a similar infrastructure-as-code tool Experience with a More ❯
Reston, Virginia, United States Hybrid / WFH Options
Plus3 IT Systems, LLC
a DevOps, DevSecOps, or cloud operations role Practical experience with cloud platforms (e.g., AWS, Azure, GCP), including configuring and managing basic services Exposure to monitoring tools like Splunk or Datadog for operational insights and basic security monitoring Experience with ticketing systems and change management processes Strong problem-solving skills and a proactive approach to operational challenges Excellent communication and teamwork More ❯
teams to address performance bottlenecks and ensure scalability. Assist engineering teams with implementing and reviewing SLOs Continually improve observability through monitoring and alerting, and dashboards, using tools such as DataDog or Prometheus for example. Work with other teams to ensure it is effective and provides full coverage. Ensure the service is highly available and resilient Champion best practices in design More ❯
technical, ambiguous domains. Strong knowledge of REST APIs , distributed system design, and performance optimization. Experience with both SQL and NoSQL data stores , caching layers, and observability tooling (e.g., Prometheus, Datadog). Nice to have: Experience deploying or integrating LLMs or NLP models in production systems. Comfortable balancing short-term execution with long-term architectural thinking . Passion for building highly More ❯
practices. Hands-on experience building and managing CI/CD pipelines and developer tooling. Deep understanding of distributed systems and debugging complex technical issues. Proficient in observability platforms like Datadog or similar. Knowledge of security principles and integration of security into infrastructure design. Proven experience with event-driven architectures and building highly available (HA) and disaster recovery (DR) compliant systems. More ❯
Cloud DevOps, SaaS, or observability, with 5+ years in leadership roles. Strong hands-on experience with AWS, GCP, Azure, K8S, Terraform and observability tools: Prometheus, Grafana, OpenTelemetry, ELK, Splunk, Datadog, and similar. Proficiency with metrics, logs, traces and APM. Leadership & Global Operations Proven success leading multi-regional or global technical teams with direct management of managers. Demonstrated ability to build More ❯
Northern Ireland, United Kingdom Hybrid / WFH Options
Jobgether
Kubernetes, cloud infrastructure (AWS/GCP), and databases such as MySQL, DynamoDB, or Cassandra. Proficiency with infrastructure-as-code tools like Terraform or Pulumi, and observability tools such as Datadog or CloudWatch. Experience in implementing AI-powered tools for workflow optimization and operational improvements. Proven success in setting up scalable, SLO-driven monitoring strategies in 24/7 environments. Ability More ❯
should have experience with The ability to lead and scale technical teams in multi-faceted governance environments AWS/Azure cloud platforms and enterprise observability tools (Elastic, Grafana, Splunk, DataDog, or similar) SRE/DevOps methodologies with Python proficiency for automation and infrastructure-as-code practices Some other highly valued skills may include AWS or Azure cloud certifications Experience implementing More ❯
Experience with performance and load testing frameworks (e.g., k6, JMeter) Familiarity with cloud-based test environments and infrastructure (AWS preferred) Working knowledge of observability and test reporting tools (e.g., Datadog, Grafana) Experience improving test data strategies and test isolation techniques Contributions to internal tooling or open-source testing frameworks Background in building out quality initiatives at the org level EverQuote More ❯
architecture and responsive web design Experience with CI/CD pipelines and DevOps practices Security implementation and vulnerability management AI coding, integration and automation tools Performance monitoring and optimisation (Datadog, etc.) Proficiency in Cordova framework and mobile app development (iOS & Android) Team Leadership Experience 5+ years leading development teams in a hands-on capacity Experience mentoring developers and conducting code More ❯
years of professional experience, some of which should have focus on Observability. Excellent knowledge and hands-on experience with monitoring, logging, and tracing tools such as Prometheus, VictoriaMetrics, Grafana, Datadog, New Relic, OpenTelemetry, ELK Stack, or similar. Experience with high volume data storage (Structured and unstructured). A strong technical background, with current capabilities and willingness to get hands on More ❯