services and technologies, particularly AWS, to optimize the performance and security of our cloud infrastructure. Monitor system health, performance, and availability using monitoring and observability tools, proactively identifying and resolving issues. Collaborate with cross-functional teams to troubleshoot and resolve complex infrastructure issues, minimizing downtime and improving system reliability. Mentor more »
Leeds, West Yorkshire, Yorkshire, United Kingdom Hybrid / WFH Options
Experis
Skills/requirements AWS including AWS ECS Terraform CI/CD practices Jenkins Containerisation (Docker) Agile practices and DevOps methodologies Nice to have : Kubernetes Observability practices/tooling (eg. Prometheus, Dynatrace) Experience supporting Scala/Java applications All profiles will be reviewed against the required skills and experience. Due to more »
AWS, or Azure. Experience with containerisation, such as Docker. Experience with CI/CD systems, such as Jenkins, GitLab, CircleCI. Experience with monitoring and observability tools. Strong written and verbal communication skills to a variety of technical and non technical audiences. A solid understanding of DevOps principles and methodologies. Extra more »
infrastructure needs, ideally in a self-service environment. WHAT YOU’LL DO: Build and provide platform features for compute, authentication, service discovery and calls, observability and redundancy Lead architectural approaches with the right tradeoffs between scale, cost and maintenance Coach junior engineers about design, development, testing and deployment best practices more »
a security-first focus, baked-in not bolted-on.Desirable Criteria· Previous experience in Python or Golang.· Previous experience of Ansible and Terraform.· Understanding of observability and its implementation.· Understanding and ability to provide quality, automated testing as you develop.· Ability to design solutions independently.· Networking knowledge.· Understanding of data science more »
applications, with a specific emphasis on Linux RedHat and hands-on experience with RedHat Satellite. Familiarity with Grafana, CI/CD monitoring, and the observability stack, particularly Ansible. Strong understanding of Agile, Site Reliability Engineering (SRE), and DevOps principles and practices. Excellent communication and interpersonal skills for effective collaboration within more »
scaling Kubernetes clusters (EKS) for containerised application deployments is essential. Collaborating with development teams to streamline application deployment lifecycles and implementing security, monitoring and observability best practices are also priorities. The ideal candidate will have 5+ years of DevOps experience with extensive Kubernetes expertise, strong programming abilities in Python (Java more »
normal—and that’s where you come in! We are seeking a skilled Site Reliability Engineer (SRE) with experience in AWS, Serverless, Monitoring, and Observability to join our team. Responsibilities: Design, build and maintain scalable, and reliable cloud infrastructure in AWS Monitor and manage the performance, reliability, and security of more »
improve skills.Previous ownership of mission-critical shared infrastructure.Nice to HaveExperience with CI/CD systems, in particular Spinnaker.Experience in implementing SRE principles.Good knowledge of observability stacks and tooling (e.g. Grafana, ELK, Prometheus, Tracing).Good discipline and skill in producing written documentation and diagrams.Good knowledge of cloud networking and security.Experience working more »
expertise throughout the organization, serving as a pivotal support pillar. What will the role involve? Enhancing and developing across DevSecOps domains including security, automation, observability, recovery & resilience, using technologies like Kubernetes and Infrastructure as Code. Delivering high-quality solutions in languages like Typescript, Golang, and Python across disciplines such as more »
Bristol, Avon, South West, United Kingdom Hybrid / WFH Options
Tal.ai
expertise throughout the organization, serving as a pivotal support pillar. What will the role involve? Enhancing and developing across DevSecOps domains including security, automation, observability, recovery & resilience, using technologies like Kubernetes and Infrastructure as Code. Delivering high-quality solutions in languages like Typescript, Golang, and Python across disciplines such as more »
priorities. You are an expert in monitoring distributed systems while leveraging industry best practices. Unpack monitoring costs and drive cost saving whilst maintaining service observability and engineering productivity. Calculate the potential costs of outages, and plan these accordingly with error budgets. Identify and facilitate development of automation that significantly reduce more »
and applications in a cloud-first environment. · First-hand deep expertise in engineering ways of working such as CI/CD, release lifecycle, data observability, data testing, continuous model validation with tangible track record of instituting change. · Software development experience in Python, or Scala. Familiarity with all, and expert in more »
other teams, with documentation and examples. In addition, you should have experience working with Kubernetes in production environments at scale, and be familiar with observability tools such as Prometheus and Grafana. Strong Linux Server Administration and Configuration Management skills, as well as some networking experience, are also required. The ideal more »
orchestration systems like Kubernetes. Implement infrastructure as code using tools like Terraform. Monitor and troubleshoot applications and infrastructure. Promote and implement best practices in observability (monitoring, tracing, alerting, logging) and incident response. What We're Looking For Strong background in Linux/Unix administration. Experience with Azure Cloud Services. Proficiency more »
MySQL, Postgres, Redis, etc.) Experience with DevOps engineering and working with container orchestration, such as with Docker or Kubernetes Experience with log monitoring and observability via platforms like Sumologic or Cloudwatch Experience automating infrastructure, testing, and deployments using tools like CircleCI Configuration management tooling and infrastructure as code knowledge is more »
MySQL, Postgres, Redis, etc.) • Experience with DevOps engineering and working with container orchestration, such as with Docker or Kubernetes • Experience with log monitoring and observability via platforms like Sumologic or Cloudwatch • Experience automating infrastructure, testing, and deployments using tools like CircleCI Configuration management tooling and infrastructure as code knowledge is more »
normal—and that’s where you come in!We are seeking a skilled Site Reliability Engineer (SRE) with experience in AWS, Serverless, Monitoring, and Observability to join our team.Responsibilities:Design, build and maintain scalable, and reliable cloud infrastructure in AWSMonitor and manage the performance, reliability, and security of our systemsImplement more »
pipelines • AWS S3 RDS Route 53 IAM EKS Secrets Manager ECR • Kubernetes Helm Kops Ingress/Egress • Terraform Deployment of AWS Resources Pipelines OCI • Observability ELK Dynatrace Prometheus • Others Vault RedHat As an equal opportunities’ employer, we welcome applications from individuals of all backgrounds. However, for you to be eligible more »
that’s dedicated to creating opportunities for our customers, partners, and employees. We hope you’ll join us. Let’s create something incredible together! Observability Engineer At Anaplan we are looking for a self-motivated Observability Engineer to join our dedicated Observability Infrastructure team. Anaplan is a high-growth company … working people who believe in simplicity, agility and performance and can choose and use the best tools for the job. In the role of Observability Engineer, you will be working on the tools used to collect and analyse Observability telemetry (Logs, Metrics and Traces). You will enable engineers across … What you’ll be doing: In this role, working a minimum of 2 days a week in our London Office, you will be: Administering observability infrastructure. Deploying and configuring OTEL agents to collect telemetry, and to visualise this data in Grafana. Pairing with your colleagues to build everything from rapid more »
stack developer, before shifting your focus to SRE/Platform Engineering (Java preferred) extensive experience with AWS, Kubernetes, Terraform, CI/CD tools strong observability experience, ideally with more modern approaches like Prometheus, Grafana, Open Telemetry comfortable with databases exposure to Kafka would be ideal more »