complex network solutions (routing, VLANs, firewalls, VPNs) Connectivity between on-premises Vmware and cloud environments Network security best practices and segmentation Experience with monitoring/logging tools (e.g., Prometheus, Grafana, Splunk) Scripting experience (e.g., PowerShell, Bash, Python) Experience with version control (Git) Experience with automation and orchestration platforms .Experience of working in an Agile Environment More ❯
AWS, Azure, or GCP, and their services for scalable, resilient systems. Expertise in containerization technologies (e.g., Docker, Kubernetes) and orchestration tools. Knowledge of monitoring and logging tools (e.g., Prometheus, Grafana, ELK Stack) for maintaining system health and performance. Ability to lead and mentor junior engineers in reliability and system optimization best practices. Excellent communication skills for effective collaboration with cross More ❯
version control systems (e.g., Git). Excellent problem-solving skills and attention to detail. Strong communication and teamwork abilities. Preferred Qualifications: Experience with monitoring and logging tools (e.g., Prometheus, Grafana, ELK Stack). Familiarity with Agile methodologies and DevOps practices. Enhanced leave - 38 days inclusive of 8 UK Public Holidays Private Health Care including family cover Life Assurance - 5x salary More ❯
in agile environments using Scrum and Kanban. Engaging with high-level stakeholders internally and externally. Technologies such as GitLab, Jenkins, Kubernetes, Docker, Terraform, Packer, Vault, Serverless, Elastic Stack, Prometheus, Grafana, Artifactory, Nexus. Due to the sector's nature, applicants should hold high-level security clearance, which requires being a British passport holder and having lived permanently in the UK for More ❯
Manchester, Lancashire, United Kingdom Hybrid / WFH Options
Embarcaderomediagroup
DevOps, YAML-based) with security scanning and progressive delivery Supporting AKS clusters and Azure services (SQL, Cosmos DB, ADF, Functions, Logic Apps, etc.) Improving monitoring and alerting with Datadog, Grafana, ELK, and proactive failure detection Participating in the on-call rota and leading incident response workflows and blameless postmortems Coaching engineers, upskilling teams, and contributing to a culture of continuous More ❯
infrastructure Other duties as needed About You 5+ years' experience in Site Reliability Engineer roles Expert+ level Linux administration, scripting, and troubleshooting Demonstrable knowledge of Observability tools (Prometheus/Grafana, New Relic, Splunk, DataDog) Comprehensive experience with AWS (Amazon Web Services) and its core capabilities (VPC, EC2, ECS, Route53, Fargate, ALB/NLB distributions, etc) Extensive experience with cloud automation More ❯
infrastructure Other duties as needed About You 5+ years' experience in Site Reliability Engineer roles Expert+ level Linux administration, scripting, and troubleshooting Demonstrable knowledge of Observability tools (Prometheus/Grafana, New Relic, Splunk, DataDog) Comprehensive experience with AWS (Amazon Web Services) and its core capabilities (VPC, EC2, ECS, Route53, Fargate, ALB/NLB distributions, etc) Extensive experience with cloud automation More ❯
troubleshooting and problem solving skills • A passion for learning new technologies and innovation Desirable: • Certifications on Amazon Web Services, including Solutions Architect, Developer, Google Cloud or Azure • Amazon Managed Grafana • JetBrains TeamCity • Google Apps Script • Agile Development #LI-JS2 Together, as owners, let's turn meaningful insights into action. Life at CGI is rooted in ownership, teamwork, respect and belonging. More ❯
pre-sales activities. Requirements Skills/Experience: Observability and SRE Practices: In-depth understanding of observability and Site Reliability Engineering practices. Familiarity with tools in the LGTM stack (Loki, Grafana, Tempo, Mimir) or equivalent observability platforms. Containerisation: Strong experience building and managing containerised applications, effectively leveraging container orchestration platforms such as Kubernetes. Cloud Expertise: Demonstrable ability to architect and implement More ❯
Please apply via the Civil Service Jobs link for your application to be considered. We're looking for outstanding Lead DevOps Engineers to develop, build and maintain our flagship service, Universal Credit and our Working Age Benefits. Universal Credit is More ❯
a move? Get in touch and apply today! Responsibilities: Respond rapidly to critical AWS incidents, identify root causes, and deploy automated hotfixes. Lead the setup and integration of Prometheus-Grafana observability stack. Refactor and modernize deployment pipelines using GitHub Actions and Kubernetes. Maintain robust monitoring, alerting, and CI/CD systems. Skills/Must have: Strong hands-on experience with … AWS (eg EC2, EKS, CloudWatch, Lambda). Background in incident, change, and problem management; comfortable with on-call rotations. Expertise in Prometheus, Grafana, and Splunk; solid knowledge of PromQL. Proficient in Scripting/programming (Python, Go, Bash, SQL). Salary: £500 per day More ❯
software brokers across cloud and on-prem platforms Responding to production incidents and working on root cause analysis and long-term fixes Monitoring system health and performance with Prometheus, Grafana, and custom dashboards Optimising Solace across WAN environments for secure, low-latency message delivery Partnering with development and support teams to troubleshoot integration and message flow issues Driving capacity planning … Were Looking For: 3+ years hands-on experience with Solace PubSub+ in a production environment Strong knowledge of WAN-based distributed systems and networking fundamentals Experience with Prometheus and Grafana for observability and alerting Confident in Linux/Unix systems and scripting (Bash, Python, etc.) Excellent problem-solving instincts and attention to detail Strong communicator who works well across technical More ❯
Southampton, Hampshire, United Kingdom Hybrid / WFH Options
MediaKind group
other Scrum ceremonies to ensure smooth project execution. Deployment Tools: Implement and manage deployment processes using Docker, Helm, Kubernetes, and VMs. Operational Platforms: Monitor and optimize operational environments using Grafana and Elastic Search. Cloud Deployment: Leverage tools such as Ansible, Terraform, Cloud API, OpenStack, OpenShift, and public cloud services for cloud deployment. Verification Tools: Use Jenkins and Azure pipelines for … the technologies below. Education: Bachelor's degree in Computer Science, Software Engineering, or a related field. Deployment Experience: Familiarity with Docker, Helm, Kubernetes, and VMs. Operational Knowledge: Experience with Grafana and Elastic Search. Cloud Tools: Understanding of Ansible, Terraform, Cloud API, OpenStack, OpenShift, and public cloud environments. Verification Tools: Experience with Jenkins and Azure pipelines. Configuration Management: Proficiency in Git More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Pathmere Partners Limited
Financial services or FinTech background is a plus but not essential Tech Stack Includes: Microsoft Azure Terraform, Bicep, ARM templates Docker, Kubernetes (AKS) Azure DevOps, GitHub Actions Helm, Prometheus, Grafana, App Insights PowerShell, Bash Benefits Competitive base salary (£90,000£110,000) Annual performance bonus Private medical insurance Pension scheme and flexible benefits Clear career path to Head of DevOps More ❯
Bash, Python). • Solid understanding of microservices, zero-trust security, mTLS, RBAC, and network policies. • Experience with CI/CD tools, logging (e.g., Fluentd, Loki), and monitoring (e.g., Prometheus, Grafana). About us Ascendion is a Global, leading provider of AI-first software engineering services, delivering transformative solutions across North America, APAC, and Europe. We are headquartered in New Jersey. More ❯
Bash, Python). • Solid understanding of microservices, zero-trust security, mTLS, RBAC, and network policies. • Experience with CI/CD tools, logging (e.g., Fluentd, Loki), and monitoring (e.g., Prometheus, Grafana). About us Ascendion is a Global, leading provider of AI-first software engineering services, delivering transformative solutions across North America, APAC, and Europe. We are headquartered in New Jersey. More ❯
experience and good understanding of Kubernetes and OpenShift Hands on experience deploying, testing, and building CI/CD pipelines Experience working with Monitoring and Logging systems, particularly Splunk, Prometheus & Grafana Excellent analysis, debugging, root-cause identification, and troubleshooting skills Hands-on experience with Oracle Databases and willingness to increase expertise (OCA or OCP certification is a plus) Strong experience in More ❯
/CD to these engineers Identifying and resolving security issues Automating tests and supporting our engineers on building great software Minimum qualifications: Strong experience with monitoring/observability tools (Grafana, Prometheus, or similar) Proficiency in Python, Docker, Kubernetes, and CI/CD pipelines Hands-on cloud experience (AWS or similar) A passion for designing and implementing scalable observability solutions Minimum More ❯
Office365 platform and applications Understanding and experience with the administration of SQL databases Experience with task automation, leveraging Python, Bash and/or PowerShell Experience with monitoring tools - PRTG, Grafana, OpenSearch, Prometheus. Beneficial Experience: Hands-on experience with Amazon Web Services Hands-on experience with Kubernetes/containerised environments Experience with No-Code tools such as Retool or Appsmith Experience More ❯
MySQL (Aurora DB), Redis (ElastiCache), MongoDB (AWS DocumentDB) Cloud & DevOps: AWS (20+ services), Kubernetes (EKS), Docker, Infrastructure as Code(CloudFormation, Terraform), CI/CD (Jenkins,GitHub Actions), Observability(AWS, Grafana) Development tools: GitHub, Jira, Notion, ChatGPT,Gemini,LangChain, AI-native IDE's (Cursor, JetBrains), LLM-powered internal tools. WHAT WE OFFER YOU A front-row seat in a fast scaling More ❯
up and managing monitoring, metrics, and alerting systems Experience operating production-grade services at scale Great to have: Experience with tools such as: Terraform, SaltStack, MongoDB, Elasticsearch, Kafka, Prometheus, Grafana or HashiCorp Vault Experience with securing applications, services, and data, including authentication, authorization, TLS, and encryption Exposure to Kubernetes (administering, deploying, or developing apps on K8s clusters) Understanding of compliance More ❯
Unix systems, SQL, and programming languages such as C++, Java or Python. Strong understanding of distributed systems and low-latency architectures Hands-on experience with observability stacks (e.g., Prometheus, Grafana, Splunk, Geneos, OpenTelemetry) and infrastructure automation (e.g., Ansible, Terraform, CI/CD pipelines) Strong understanding of the trade lifecycle, market data, and fixed income products, FX or algorithmic trading experience More ❯