Manage cloud infrastructure (OCI, AWS, Azure, or GCP) using Infrastructure as Code tools like Terraform or Serverless Functions. Monitor system health and performance using tools like Prometheus, Grafana, or Datadog or NewRelic. Collaborate closely with development teams to automate builds, performance tests, and deployments. Ensure system security, compliance, and best practices are followed in deployment pipelines. Ensure network security with More ❯
development in general, with skills in a high-level language (e.g., Python, JavaScript, TypeScript, Java) and familiarity with modern development practices Understanding of Cloud Observability, Monitoring, and Tracing tools (Datadog, CloudWatch, Jaeger, ELK) and how best to leverage to support effective MTTR and mitigate high CFR Our UK benefits: Stock Options Annual Performance Bonus or Commissions Pension matched up to More ❯
using Kubernetes or similar tools in production deployments Experience with: AWS security best practices including IAM, security groups, encryption, and compliance frameworks Monitoring tools such as CloudWatch, Prometheus, Grafana, DataDog, or NewRelic Infrastructure as Code using Terraform Containerised CI/CD solutions Linux system administration, including shell scripting and system optimisation Desirable Skills Experience with AWS services such as SQS More ❯
GitLab CI). Write clean, production-grade code in Python (Scala is a bonus). Build infrastructure using Terraform, AWS CloudFormation, or SAM. Drive observability across the platform using Datadog or CloudWatch. Actively mentor Data Engineers and Associates, and lead technical discussions and design sessions. Key requirements: Must-Have: Strong experience with AWS services: Glue, Lambda, S3, Athena, Step Functions … operate services in production. Good to Have: Experience with Scala for data applications. Familiarity with serverless/event-driven architectures. Experience designing scalable, low-latency data services. Exposure to Datadog or CloudWatch monitoring tools. Nice to Have: Experience with LLM-powered applications or OpenAI APIs . Professional experience in a similar environment or high-scale system. Key Roles and Responsibilities More ❯
configuration management tools (e.g., Ansible, Puppet, Chef). Knowledge of infrastructure as code (IaC) tools (e.g., Terraform, CloudFormation). Experience with monitoring and logging tools (e.g., Prometheus, ELK Stack, Datadog). Passion for continuous learning and professional development. ABOUT BUSINESS UNIT IBM Consulting is IBM's consulting and global professional services business, with market leading capabilities in business and technology More ❯
Burton-On-Trent, Staffordshire, West Midlands, United Kingdom
Amtis Professional Ltd
scalable, secure infrastructure in AWS and Azure Build and maintain CI/CD pipelines using tools such as Azure DevOps Implement and manage monitoring, alerting and logging systems (e.g. Datadog, Logic Monitor, SolarWinds) Automate infrastructure provisioning using Infrastructure as Code (IaC) tools such as Terraform Ensure compliance with security policies; manage IAM, PIM and RBAC access controls Respond to incidents More ❯
in Computer Science, Management Information Systems, or related fields is desirable but not essential. Nice to have but not essential: Service monitoring and graphing tools (Prometheus + Grafana, Nagios, Datadog) Elastic Stack Repository solutions (JFrog Artifactory, JFrog Bintray) OpenVPN SQL Databases (MongoDB, PostgreSQL, MySQL) Our Values: We work together We believe in people We won't accept the "way it More ❯
Manchester, Lancashire, United Kingdom Hybrid / WFH Options
Embarcaderomediagroup
Azure DevOps, YAML-based) with security scanning and progressive delivery Supporting AKS clusters and Azure services (SQL, Cosmos DB, ADF, Functions, Logic Apps, etc.) Improving monitoring and alerting with Datadog, Grafana, ELK, and proactive failure detection Participating in the on-call rota and leading incident response workflows and blameless postmortems Coaching engineers, upskilling teams, and contributing to a culture of … DB, etc.) Strong Infrastructure as Code skills with Terraform (v1.7+) Experience with CI/CD pipelines, GitOps, and automation tools (PowerShell, Bash) Familiarity with observability and incident tools like Datadog, ELK, and synthetic monitoring Solid understanding of networking (TCP/IP, Load Balancing, DNS, Routing) Good knowledge of DevSecOps practices - including security scanning, IAM, and RBAC Experience with FinOps - tagging … Familiarity with security scanning tools (Trivy, tfsec) integrated into pipelines A proactive approach to problem-solving, documentation, and coaching Additional bonus skills include experience with Azure governance tools, advanced Datadog capabilities, Kubernetes autoscaling solutions, GitOps workflows, automated cost dashboards, compliance frameworks, and internal platform development. What You Can Expect: Competitive salary: £70,000 - £80,000 depending on experience 25 days More ❯
Jenkins) Enterprise repository management systems (Artifactory) Workflow management and collaboration tools (Jira, Confluence, Google Suite) Cloud infrastructure (AWS) Monitor, debug and improve system performance and reliability using tools like Datadog, Grafana, or OpenSearch Update and maintain the development experience via automated pipelines that includes timely feedback and a seamless path to release-level quality. Maintain and administer cloud infrastructure. Troubleshoot … project management tools (Jira or other) Experience with cloud based infrastructure (AWS or other) Experience with containers (Docker) and container orchestration (Kubernetes) Basic understanding on visualization tools (Tableau &/Datadog) Basic data analytics skills Excellent written and verbal communication skills This role requires commuting distance to the Glasgow office. Qualified candidates must be able to be in our office at More ❯
and architecture through to production deployment and support.You'll work closely with experienced engineers and domain experts to deliver mission-critical services with a strong focus on scalability, observability (DataDog), and quality. You'll also contribute to architectural design, sequence diagrams, and flow mapping, ensuring robust documentation and testing standards are met.This is a full Agile environment, and you'll More ❯
architecture through to production deployment and support. You'll work closely with experienced engineers and domain experts to deliver mission-critical services with a strong focus on scalability, observability (DataDog), and quality. You'll also contribute to architectural design, sequence diagrams, and flow mapping, ensuring robust documentation and testing standards are met. This is a full Agile environment, and you More ❯
test: Containerisation (e.g. Docker), Virtualisation and Provisioning, Workload and job scheduling (e.g. Kubernetes, Ray) on high core-count machines and rack-scale installations, Management and Observability (e.g. Prometheus, OpenTelemetry, DataDog, Splunk, etc.). 10+ years of relevant experience related to quality assurance/testing teams. Experience with the Atlassian suite and CI/CD platforms such as Jenkins; GitHub or More ❯
Watford, Hertfordshire, United Kingdom Hybrid / WFH Options
Wickes
You'll have a deep understanding of modern cloud ecosystems, with extensive hands-on experience in Amazon Web Services (AWS). Familiarity with modern observability concepts and tools, including Datadog, and proven experience with the "platform as a product" model and driving adoption of internal tools. Strong familiarity with CI/CD principles and pipelines (e.g., Jenkins, GitLab CI, CircleCI More ❯
Leeds, England, United Kingdom Hybrid / WFH Options
Anson McCade
frameworks Desirable Experience Delivery of secure software in government, defence, or other regulated sectors Hands-on cloud-native development and deployment Knowledge of logging and monitoring tools such as DataDog, Prometheus, or StackDriver Experience working with product lifecycle tooling and engineering in complex domains If you’re looking to focus on real engineering work that drives meaningful outcomes and want More ❯
Leeds, West Yorkshire, England, United Kingdom Hybrid / WFH Options
Anson McCade Ltd - IT and Finance Recruitment
Nice to Have (But Not Essential) Cloud experience: AWS, Azure or GCP Solid grasp of databases and data modelling Familiarity with open-source tools and monitoring platforms (e.g., Prometheus, DataDog) Experience with test automation frameworks and performance tools More ❯
test: Containerisation (e.g. Docker), Virtualisation and Provisioning, Workload and job scheduling (e.g. Kubernetes, Ray) on high core-count machines and rack-scale installations, Management and Observability (e.g. Prometheus, OpenTelemetry, DataDog, Splunk, etc.). 10+ years of relevant experience related to quality assurance/testing teams. Experience with the Atlassian suite and CI/CD platforms such as Jenkins; GitHub or More ❯
Cambourne, Cambridgeshire, United Kingdom Hybrid / WFH Options
Remotestar
Job description RemoteStar is looking to hire a Senior Site Reliability Engineering Manager on behalf of our client based in the UK with a fully remote work policy. About Client: The client building, the B2B marketplace for diamonds. It's More ❯
Portsmouth, Hampshire, United Kingdom Hybrid / WFH Options
Checkatrade
Senior Platform Engineer Experience in Cloud Native technologies? Come join us! Are you looking for a new role? We have an exciting opportunity at Checkatrade for a Senior Platform Engineer to join our mission of making home improvements easy by More ❯
Newcastle Upon Tyne, Tyne And Wear, United Kingdom
Strive Gaming
in on-call rotations and help troubleshoot production issues. Tech Requirements (must have): IAC - Infrastructure as Code (Terraform) AWS Argo Strong linux skills ELK/LGTM stack knowledge Prometheus DataDog Grafana Kubernetes Helm Docker Bash/shell scripting Git Strong security mindset Tech (nice to have) Crowdstrike OnPrem/ESXI Windows Server EntraID More ❯
SNS, S3, EventBridge, Step Functions, and more. You'll be well-versed in tools like GitHub for managing repositories, pipelines, artifacts, code control, and deployments. You'll also use Datadog for observability, and have a strong understanding of security scanning, code quality, and general security awareness. Experience with LaunchDarkly is a plus. About You You're not just technically strong More ❯
SNS, S3, EventBridge, Step Functions, and more. You'll be well-versed in tools like GitHub for managing repositories, pipelines, artifacts, code control, and deployments. You'll also use Datadog for observability, and have a strong understanding of security scanning, code quality, and general security awareness. Experience with LaunchDarkly is a plus. About You You're not just technically strong More ❯
SNS, S3, EventBridge, Step Functions, and more. You'll be well-versed in tools like GitHub for managing repositories, pipelines, artifacts, code control, and deployments. You'll also use Datadog for observability, and have a strong understanding of security scanning, code quality, and general security awareness. Experience with LaunchDarkly is a plus. About You You're not just technically strong More ❯
Google Cloud. Familiarity with database systems, data modelling, and SQL/NoSQL technologies. Comfortable working with a range of open-source tools and frameworks. Experience with observability tools like DataDog, Prometheus, or StackDriver. Knowledge of test automation frameworks and practices. Why This Role Stands Out No sales responsibilities or forced management track—grow deeply in your technical craft. Access to More ❯
practices. Hands-on experience building and managing CI/CD pipelines and developer tooling. Deep understanding of distributed systems and debugging complex technical issues. Proficient in observability platforms like Datadog or similar. Knowledge of security principles and integration of security into infrastructure design. Proven experience with event-driven architectures and building highly available (HA) and disaster recovery (DR) compliant systems. More ❯
should have experience with The ability to lead and scale technical teams in multi-faceted governance environments AWS/Azure cloud platforms and enterprise observability tools (Elastic, Grafana, Splunk, DataDog, or similar) SRE/DevOps methodologies with Python proficiency for automation and infrastructure-as-code practices Some other highly valued skills may include AWS or Azure cloud certifications Experience implementing More ❯