IaC tools such as Terraform, Bicep, or ARM templates. Solid understanding of cloud-native architectures, microservices, and container orchestration. Familiarity with monitoring, alerting, and observability tools (Azure Monitor, Application Insights, etc.). Strong grasp of cloud security, identity management, and compliance principles. Confident communicator with experience managing stakeholders and leading More ❯
IaC tools such as Terraform, Bicep, or ARM templates. Solid understanding of cloud-native architectures, microservices, and container orchestration. Familiarity with monitoring, alerting, and observability tools (Azure Monitor, Application Insights, etc.). Strong grasp of cloud security, identity management, and compliance principles. Confident communicator with experience managing stakeholders and leading More ❯
securely, recover quickly, and move with confidence. You’ll build the systems that make rapid iteration safe — from deployment workflows and rollout strategies to observability, alerting, and incident response. You'll be actively exploring innovative approaches to automation, deployment, and environment management — things that give our client leverage, speed, and More ❯
securely, recover quickly, and move with confidence. You’ll build the systems that make rapid iteration safe — from deployment workflows and rollout strategies to observability, alerting, and incident response. You'll be actively exploring innovative approaches to automation, deployment, and environment management — things that give our client leverage, speed, and More ❯
of working with Kubernetes and Cloud Platforms (AWS, GCP or Azure). Expertise in one or more of the following areas: Database Administration, Networking, Observability Tools, or automation of infrastructure. Ability to tackle design and functionality problems independently with little to no oversight. Excellent debugging and troubleshooting skills. Preferred qualifications More ❯
spanning from foundational OS networking layers to cloud provider configurations. Proven experience in leading projects within security-focused areas, such as runtime scanning, security observability, CSPM, and more Cloud Expertise: Strong experience with at least one cloud platform (AWS, Azure, GCP), including expertise in IAM, VPC networking, security groups, and More ❯
Peterborough, Cambridgeshire, United Kingdom Hybrid / WFH Options
BGL Group
Management to support the goals and objectives on your team. You will have a focus on end-to-end responsibility for the development, quality, observability, and testing of the software you build. This role will offer you the opportunity to get hands on with a number of different technologies, however More ❯
Terraform/Terragrunt) Kubernetes expertise in container orchestration and cluster management Network engineering skills including load balancers, CDN, Istio, and security patterns Experience with observability platforms (OpenTelemetry) and distributed systems Nice-to-have skills: Python programming and Linux system debugging Database administration (SQL, MongoDB, Redis) Message broker and event streaming More ❯
Sunderland, Tyne And Wear, United Kingdom Hybrid / WFH Options
Tombola
other teams to foster collaborative practices, streamline processes, and enhance product quality, ultimately improving delivery timelines. Focus on enhancing our technical ecosystem by emphasizing observability practices, ensuring robustness and reliability. Leverage your technical expertise as an integral part of our development team to tackle intricate challenges and construct reliable, well More ❯
Terraform/Terragrunt) Kubernetes expertise in container orchestration and cluster management Network engineering skills including load balancers, CDN, Istio, and security patterns Experience with observability platforms (OpenTelemetry) and distributed systems Nice-to-have skills: Python programming and Linux system debugging Database administration (SQL, MongoDB, Redis) Message broker and event streaming More ❯
using Infrastructure as Code (IaC). You will work across all layers of infrastructure, including: Networking & Exchange Connectivity Linux Systems & Kubernetes Administration Microservice Orchestration & Observability Disaster Recovery & Security Optimization Your mission is to improve latency, scalability, and reliability, ensuring GSR remains a best-in-class market maker. We value engineers More ❯
data and AI systems and applications in a cloud-first environment. Skilled in engineering ways of working such as CI/CD, release lifecycle, observability, testing, and continuous model validation with a tangible track record of instituting change. Programming experience - ideally in Python or open to using Python. Familiarity with More ❯
impact on operations. Participate in a support on-call schedule. What We Value Confidence in troubleshooting complex systems issues independently using stack traces and observability & systems tools. Comfort with managing large scale production systems and technologies with configuration management, load balancing, monitoring & alerting infrastructure, and container orchestration. Ability to work More ❯
cloud technologies including OpenShift, Google Anthos, AWS EKS Anywhere, AWS Outposts A strong background in Go, Python or Java Experience with Postgres Experience with observability tools, e.g. Prometheus, Grafana Benefits Highly competitive salary Pension plan (match up to 5%) Life insurance - three times annual salary Competitive maternity (six months fully More ❯
Cloud Operations. Good working technical knowledge (certificates are very welcome) in different cloud technologies and Azure and AWS Cloud Platforms. Experience managing monitoring, alerting, observability, and dashboarding platforms (such as AWS Monitor, Prometheus, Grafana, and Elasticsearch). Good understanding of NOC and DevOps practices. Experience and in-depth knowledge of More ❯
Northampton, Northamptonshire, East Midlands, United Kingdom Hybrid / WFH Options
City Plumbing
cause analyses. Ability to work effectively with other technical teams such as DevOps to support deployments and troubleshoot issues. Knowledge of DevOps practices (CICD, observability, automation) is a bonus. Knowledge of AWS and logging tools such as Kibana and Datadog is an advantage Continuous learning mindset and a passion for More ❯
Uxbridge, Middlesex, United Kingdom Hybrid / WFH Options
Avature
monthly for our themed culture days. Plus all our giffgaffers come together at our legendary giffgaff summer, birthday and Christmas celebrations. The Must Haves: Observability, "you build it you run it" attitude Mentoring, good communication, giving and receiving feedback The Other Stuff We Are Looking For: Event-Driven Architecture; SOLID More ❯
will consult in design phases to architect reliability and scalability requirements. You will design and rollout policies, procedures, and standards that prioritize security, reliability, observability, scalability, and cost optimization. You will engage with our enterprise and/or external partners to oversee cloud infrastructure, covering monitoring, uptime, and issue resolution. More ❯
into a product. Experience being a key person in the designing of a non-trivial solution and working with others to implement. Worked with observability solutions (Kibana, Grafana, Sentry). Experience using further technologies we use (Terraform, AWS RDS, AWS ECS or EKS, AWS EventBridge). Salary range The perks More ❯
in modern digital platforms. Hands-on experience with GitHub and GitHub Actions for implementing CI/CD pipelines and supporting DevOps workflows. Familiarity with observability tools and practices (e.g., logging, tracing, metrics) to support performance, reliability, and incident response. Understanding of content management systems (CMS), digital asset management (DAM), and More ❯
Bradford, Yorkshire, United Kingdom Hybrid / WFH Options
Freemans Grattan Holdings (fgh)
and implementing website performance monitoring and optimisation strategies to improve page load times, identify, diagnose and resolve issues and enhance customer experience. Enhancing system observability through logging, monitoring, and alerting (Elastic Search, Logstash, Kibana, New Relic, PRTG, ScienceLogic etc). Implementing and managing caching solutions, including Squid Cache, to optimise More ❯
For Delivering robust, fully tested, maintainable software that impacts end users Designing and implementing production-ready scalable NLP applications and APIs Developing monitoring and observability solutions and integration testing frameworks Conducting code reviews and providing constructive feedback to team members Ensuring the scalability, performance, and reliability of AI applications Staying More ❯
Monitoring Proactively manage the health of the Control M environment by continuously monitoring core components and identifying issues before they become service impacting Utilise observability tools eg AppDynamics Splunk ThousandEyes to analyse system performance and optimize operations Testing Deployment Develop and execute comprehensive functional and non functional test cases for … core Control M components In depth knowledge of Control M modules including Workload Change Manager Workload Archiving and Workflow Insights Hands on experience with observability tools such as AppDynamics Splunk and ThousandEyes Broad IT infrastructure background with proven experience managing Oracle databases Proficiency in scripting languages eg Python PowerShell for More ❯
Monitoring Proactively manage the health of the Control M environment by continuously monitoring core components and identifying issues before they become service impacting Utilise observability tools eg AppDynamics Splunk ThousandEyes to analyse system performance and optimize operations Testing Deployment Develop and execute comprehensive functional and non functional test cases for … core Control M components In depth knowledge of Control M modules including Workload Change Manager Workload Archiving and Workflow Insights Hands on experience with observability tools such as AppDynamics Splunk and ThousandEyes Broad IT infrastructure background with proven experience managing Oracle databases Proficiency in scripting languages eg Python PowerShell for More ❯
management team, to deliver industry-leading DevOps and Infrastructure products that provide Infrastructure-as-code abstractions and operating principles, leading cloud computing capability, automation, observability, operability, and developer experience. You will drive the product roadmap, guide product development initiatives, and ensure the successful launch and adoption of DevOps and Infrastructure … be a plus: Strong understanding of modern infrastructure and site reliability engineering practice, including Infrastructure-as-code tools (e.g. Terraform, Ansible ) and metrics and observability tools (e.g. Prometheus, Grafana ). Strong understanding of modern DevOps practice, including DevOps stacks (e.g. Jenkins, GitLab, CircleCI ). Cloud experience (e.g. AWS, Google Cloud More ❯