/Scrum methodologies and SAFe frameworks. Excellent problem-solving, communication, and leadership skills. Hands-on experience with CI/CD pipelines, particularly GitHub Actions or similar frameworks. Knowledge of observability platforms and zero-downtime deployment strategies. Exposure to AI/ML integration and data-driven orchestration frameworks. Experience with ETL processes and SQL Server Integration Services (SSIS) is highly beneficial. More ❯
/CD pipelines (GitHub Actions, GitLab CI, Azure DevOps, Jenkins) Experience withconfiguration managementtools such asChef/Puppet Strong proficiency in scripting/programming (Python, Go, or similar) Experience with observability platforms (Datadog, New Relic, Prometheus/Grafana) Knowledge of microservices architecture and service mesh technologies Understanding of security best practices and compliance frameworks Comfortable with asynchronous collaboration tools (Slack, Teams More ❯
swindon, wiltshire, south west england, united kingdom Hybrid/Remote Options
Humana
Become a part of our caring community and help us put health first Why Join Enterprise Observability Engineering? The Enterprise Observability Engineering team is a high-impact, high-autonomy group focused on building intelligent, scalable, and resilient observability solutions. We foster a culture of innovation, agility, and ownership—empowering engineers to solve complex problems, drive strategic initiatives, and shape the … challenges, and leading with technical excellence—this is the team for you. About the Role We're looking for a Lead Software Engineer with deep expertise in logging and observability engineering. You should be fluent in the principles of open telemetry, log ingestion, and event correlation across distributed systems. While familiarity with platforms like Splunk or Dynatrace is a plus … to design resilient, scalable logging solutions using the best-fit tools for the environment. As a Lead Software Engineer, you will drive the design, implementation, and evolution of our observability and logging platforms. You'll lead enterprise-scale initiatives, mentor engineers, and collaborate across disciplines to ensure our systems are reliable, scalable, and performant. Applying deep technical expertise to solve More ❯
monitoring) Strong experience in distributed system design, development and deployment using agile/devops practices. Experience with CI/CD pipelines (GitHub Actions, or similar) Experience implementing monitoring and observability using Prometheus, Grafana or Databricks-native solutions. Good communication skills, excellent teamwork experience, ability to mentor and develop more junior developers, including participating in constructive code reviews Preferred Skills: Experience More ❯
Cover on-call rotation for production support (1 week out of 6) As well as making improvements to: • Deployment automation and release management processes • Application and infrastructure monitoring and observability • Security scanning and vulnerability management in pipelines • Performance optimization and capacity planning • Development team productivity through tooling and automation What we would like from you • Strong experience with CI/ More ❯
technical direction Desirable Qualifications Experience in fintech, payments, or enterprise SaaS platforms Exposure to event-driven architecture (Kafka, RabbitMQ) Familiarity with infrastructure-as-code tools (Terraform, CloudFormation) Understanding of observability tools (Prometheus, Grafana, ELK stack) Apply now and Vibe with Us! (blob:)0:00/0:26We are looking for new employees who will embrace the Edenred adventure with the More ❯
City Of Westminster, London, United Kingdom Hybrid/Remote Options
Additional Resources
high-volume processing. Deploying and managing containerised workloads through Kubernetes, Helm, and Docker. Automating infrastructure using Infrastructure-as-Code tools such as Terraform and Ansible. Ensuring system reliability through observability, monitoring, and proactive issue resolution. Collaborating with cross-functional teams to align data solutions with wider business needs. Supporting the continuous improvement of processes, deployment, and data quality standards. What More ❯
Westminster, City of Westminster, Greater London, United Kingdom Hybrid/Remote Options
Additional Resources
high-volume processing. Deploying and managing containerised workloads through Kubernetes, Helm, and Docker. Automating infrastructure using Infrastructure-as-Code tools such as Terraform and Ansible. Ensuring system reliability through observability, monitoring, and proactive issue resolution. Collaborating with cross-functional teams to align data solutions with wider business needs. Supporting the continuous improvement of processes, deployment, and data quality standards. What More ❯
London, South East, England, United Kingdom Hybrid/Remote Options
Additional Resources Ltd
high-volume processing. Deploying and managing containerised workloads through Kubernetes, Helm, and Docker. Automating infrastructure using Infrastructure-as-Code tools such as Terraform and Ansible. Ensuring system reliability through observability, monitoring, and proactive issue resolution. Collaborating with cross-functional teams to align data solutions with wider business needs. Supporting the continuous improvement of processes, deployment, and data quality standards. What More ❯
and problem-solving skills. Knowledge of security practices (IAM, encryption, secrets management Experience with incident management frameworks and SRE principles. Knowledge of performance tuning and capacity planning. Exposure to observability tools and log aggregation systems. Understanding of networking and security fundamentals. Design, implement, and maintain monitoring, logging, and alerting systems. Define and track Service Level Indicators (SLIs), Objectives (SLOs), and More ❯
deployment pipelines. Excellent problem-solving skills and ability to debug complex systems. Strong communication and collaboration skills, with a commitment to mentoring and team development. Preferred skills Understanding of observability practices, including logging, metrics, and tracing. Experience with monitoring tools such as Prometheus and Grafana. Awareness of cloud security best practices, including IAM policies and secret management. Exposure to Agile More ❯
Edinburgh, Midlothian, United Kingdom Hybrid/Remote Options
Aberdeen
internal workshops, brown bags, or tech talks to share knowledge and promote adoption of tools and practices. About the Candidate The ideal candidate will possess the following: Experience with observability tools (eg, Grafana, Prometheus, Datadog). Background in DevOps, SRE, or platform engineering with a security first mindset. Strong programming skills in languages such as .Net, JavaScript, Python or similar. More ❯
built on Solace PubSub+, ensuring high availability, optimal performance, and reliability across production and non-production environments. This includes working on incident response, capacity planning, WAN optimization, and system observability using tools like Prometheus and Grafana . Key Responsibilities: Administer and maintain Solace PubSub+ appliances and software brokers across environments (on-prem and cloud). Provide production support for messaging More ❯
built on Solace PubSub+, ensuring high availability, optimal performance, and reliability across production and non-production environments. This includes working on incident response, capacity planning, WAN optimization, and system observability using tools like Prometheus and Grafana . Key Responsibilities: Administer and maintain Solace PubSub+ appliances and software brokers across environments (on-prem and cloud). Provide production support for messaging More ❯
and/or Motion Planning to inform modeling & simulation (M&S) and physical systems Developing and testing multi-agent autonomous systems and deploying in real-world environments Familiarity with observability concepts and tools. Knowledge of security best practices for DevOps and MLOps. Note: If you are interested, please share your updated resume and suggest the best number & time to connect More ❯
to architect secure, performant, and highly available cloud solutions. Proficiency with monitoring and log analytics tools such as AWS CloudWatch, ELK Stack, Prometheus, Datadog, or New Relic, to maintain observability and ensure operational excellence. Demonstrated leadership skills in managing complex, high-pressure situations and guiding teams through incident resolution. Exceptional communication and presentation skills, with proven experience engaging with senior More ❯
City of London, London, United Kingdom Hybrid/Remote Options
Advanced Resource Managers
to architect secure, performant, and highly available cloud solutions. Proficiency with monitoring and log analytics tools such as AWS CloudWatch, ELK Stack, Prometheus, Datadog, or New Relic, to maintain observability and ensure operational excellence. Demonstrated leadership skills in managing complex, high-pressure situations and guiding teams through incident resolution. Exceptional communication and presentation skills, with proven experience engaging with senior More ❯
for authentication and federated SSO. Apply Zero Trust principles with least-privilege access, RBAC, and multi-factor authentication. Implement monitoring and logging solutions using CloudWatch, Grafana, and OpenSearch for observability and alerting. Support DevSecOps integration including code quality gates, image scanning, and compliance automation (OPA, Conftest, Checkov). Collaborate with development teams to containerize legacy applications and migrate them to More ❯
streaming platforms. Security & Compliance: Identity and access management (IAM), Secure design principles, awareness of regulatory frameworks (e.g., GDPR, HIPAA, SOX, SOC2) Tools & Platforms : Familiarity with enterprise platforms, monitoring and observability tools, API gateways and service meshes.Location:COL Work-at-HomeLanguage Requirements:English (Required)Time Type:Full time2025-10-31 If you are a California resident, by submitting your information, you More ❯
Mansfield, England, United Kingdom Hybrid/Remote Options
Future Talent Group
using Terraform. Implement and optimise CI/CD pipelines using GitHub Actions, Docker, and GitOps practices. Deploy, orchestrate, and manage Kubernetes (AKS/Container Apps) workloads. Configure monitoring and observability with Azure Monitor, Application Insights, Log Analytics, and OpenTelemetry. Partner with software engineering and infrastructure teams to drive DevOps best practices across the organisation. Manage security and compliance in Azure More ❯
FinOps practices. Experience with infrastructure-as-code tools (eg, Terraform, Helm, Ansible). Familiarity with CI/CD pipelines and automation (eg, GitHub Actions, ArgoCD, Jenkins). Experience on observability tools like Prometheus, Grafana Knowledge of Linux systems administration and networking fundamentals and experience with policy-as-code. Passion for platform engineering, developer experience, and site reliability UAL is a More ❯
FinOps practices. Experience with infrastructure-as-code tools (e.g., Terraform, Helm, Ansible). Familiarity with CI/CD pipelines and automation (e.g., GitHub Actions, ArgoCD, Jenkins). Experience on observability tools like Prometheus, Grafana Knowledge of Linux systems administration and networking fundamentals and experience with policy-as-code. Passion for platform engineering, developer experience, and site reliability UAL is a More ❯
DevOps & SRE Practices Experience implementing CI/CD pipelines and DevOps methodologies Knowledge of infrastructure monitoring (Datadog), log aggregation, and incident management Understanding of SLO/SLA definition and observability best practices Strategic & Business Acumen Ability to align technical initiatives with business objectives and articulate ROI Experience creating technical roadmaps and conducting cost-benefit analyses Track record presenting to C More ❯
Demonstrated expertise in ICAP implementation and proxy server integration for content adaptation and security enforcement. Hands-on scripting and programming experience for automation (e.g., Python, Java). Exposure to observability practices, including monitoring, logging, metrics, and traces, to ensure operational excellence in deployments. Experience supporting U.K. Government customers, with knowledge of security clearance processes and compliance requirements. Work/Life More ❯
pipelines handling millions of requests with low latency Deploy and operate services on Kubernetes and Docker, leveraging AWS infrastructure such as EC2, S3, Lambda, and RDS Implement monitoring and observability using tools like Grafana and Prometheus to track system performance Collaborate with product, frontend, and analytics teams to deliver features that make a tangible impact on user experience Contribute to More ❯