APIs Various testing methodologies System design at high scale and commercial experience with: SQL and NoSQL databases Async processing Cloud native applications Working in a Continuous Delivery environment Modern observability practices Nice to have Not vital, but you'll have the edge if you also have experience with: Grafana Prometheus Kotlin or a least the willingness to learn it Batch More ❯
coaching skills Strong problem solving and communication skills Strong understanding of SDLC Expertise with cloud technologies especially AWS Good experience delivering solutions and impact in agile environments Good with Observability, Monitoring and Serverless technology Experience providing data for consumption via API Experience and strong understanding of API First principles Our Mission: Creating trusted open-source intelligence has always been our More ❯
coaching skills Strong problem solving and communication skills Strong understanding of SDLC Expertise with cloud technologies especially AWS Good experience delivering solutions and impact in agile environments Good with Observability, Monitoring and Serverless technology Experience providing data for consumption via API Experience and strong understanding of API First principles Our Mission: Creating trusted open-source intelligence has always been our More ❯
and improve our cloud infrastructure (GCP), containerised workloads (Docker/Kubernetes), and CI/CD pipelines. Monitor, troubleshoot, and optimise system performance in production. Apply best practices for reliability, observability, and security across our stack. Collaborate Work closely with product and design to deliver features that delight our members. Pair with other engineers to solve problems, review code, and raise More ❯
California, with additional locations across the globe. What you'll do: As a Site Reliability Engineer at Zefr, you'll apply your expertise in cloud infrastructure, CI/CD, Observability, and core SRE concepts, to deliver high-quality, reliable, and scalable solutions. A significant aspect of this role involves working closely with Zefr's Engineering and Data Science teams ensuring … EKS expected), Helm, Kustomize Service Mesh: Istio CI/CD & Automation: CI/CD Pipelines: GitHub Actions GitOps/Continuous Delivery: Argo CD Primary Scripting/Automation Language: Python Observability & Monitoring: Monitoring & Alerting: Prometheus, Datadog, Pagerduty Telemetry Standards: OpenTelemetry Application & Data Ecosystem (Supporting): Application Languages/Frameworks: Python, FastAPI, Flask, Node.js, React Data Streaming: Apache Kafka Data Processing/Transformation … CircleCI, Argo CD, Flux) Knowledge of IaC and configuration management tools (Terraform, OpenTofu, Crossplane, Pulumi, Ansible, CloudFormation) Strong problem-solving experience, focusing on automation Production experience with Monitoring and Observability tools (Prometheus, Grafana, Datadog, Thanos, New Relic, Open Telemetry) Understanding of Cloud Networking concepts (Mesh Networking, NAT, Load Balancers, SSL Certificates and TLS termination, API Gateways, proxies, etc) Strong written More ❯
Oldham, Greater Manchester, North West, United Kingdom
Innovative Technology
consistency, repeatability, and auditability across environments Develop and maintain developer tooling and golden templates (CI/CD pipelines, scaffolds, environments) to standardize best practices across teams Design and implement observability frameworks (metrics, tracing, logging, alerting) that are easy to consume and part of the platform baseline Eliminate repetitive tasks through automation and opinionated defaults, so teams are not blocked by … and orchestration (Docker, Kubernetes) Familiarity with CI/CD systems (GitHub Actions, GitLab CI, Jenkins, etc.) Hands-on experience with infrastructure-as-code tools (e.g., Terraform, CloudFormation) Knowledge of observability tools (Prometheus, Grafana, ELK stack, Datadog, etc.). Solid grasp of Linux systems and networking fundamentals Strong problem-solving and debugging skills Your Package & Perks: A competitive salary Flexible working More ❯
consistency, repeatability, and auditability across environments Develop and maintain developer tooling and golden templates (CI/CD pipelines, scaffolds, environments) to standardize best practices across teams Design and implement observability frameworks (metrics, tracing, logging, alerting) that are easy to consume and part of the platform baseline Eliminate repetitive tasks through automation and opinionated defaults, so teams are not blocked by … and orchestration (Docker, Kubernetes) Familiarity with CI/CD systems (GitHub Actions, GitLab CI, Jenkins, etc.) Hands-on experience with infrastructure-as-code tools (e.g., Terraform, CloudFormation) Knowledge of observability tools (Prometheus, Grafana, ELK stack, Datadog, etc.). Solid grasp of Linux systems and networking fundamentals Strong problem-solving and debugging skills Your Package & Perks: A competitive salary Flexible working More ❯
experience. • Implement and evolve CI/CD pipelines, deployment strategies, and GitOps workflows. • Work closely with software engineers to embed infrastructure and operational thinking early in the SDLC. • Champion observability, reliability, and performance through metrics, logging, and alerting best practices. • Lead incident response, postmortems, and ongoing resilience improvements. • Contribute to a 24 7 on-call rotation for critical systems. Minimum … platforms (AWS, GCP) • Solid grasp of networking, Linux internals, and security best practices. • Deep understanding of CI/CD tools and practices (GitHub Actions, Jenkins, ArgoCD, etc.). • Strong observability mindset-experience with tools like Prometheus, Grafana, Loki, etc. • Experience with hybrid service meshes, multi-cluster Kubernetes, or edge computing, preferred. • Knowledge of Kafka, Redis, Elasticsearch, or RDBMS (MySQL/ More ❯
influencing decisions on direction and prioritisation Define, design and lead the implementation of Service-Level Indicators (SLIs) and Service-Level Objectives (SLOs) that truly reflect customer experience, alongside appropriate observability and monitoring Work alongside lead product engineers to design testing for reliability, performance, capacity and DR Lead reliability delivery for the team, assuming accountability while managing risks and dependencies, and … a bonus Strong knowledge of application architectures, messaging middleware, and network protocols Strong coding capabilities (scripting languages Python, Bash, Ansible, Terraform, etc.); Java a bonus Experience with monitoring and observability tools such as OpenTelemetry, Splunk, Prometheus, Grafana, etc Experience automating CI/CD processes and solutions A growth mindset; eagerness to learn and adapt in a fast-paced trading environment. More ❯
design sessions, and architecture governance Mentor team members and maintain a culture of technical excellence, continuous learning, and collaborative delivery Define and uphold engineering best practices, including documentation, testing, observability, and CI/CD integration Collaborate with Product, DevOps, and Security teams to translate business needs into reliable technical solutions Drive infrastructure automation and deployment consistency using Kubernetes and Git … stakeholder engagement skills-able to translate complexity into clarity Experience with Terraform, Helm, or GitOps tooling Familiarity with front-end technologies such as React and TypeScript Exposure to GraphQL, observability stacks (e.g., Prometheus, OpenTelemetry), or large-scale data platforms Prior work in regulated industries (BFSI, telecom, public sector) To succeed in this role, you'll bring more than just technical More ❯
efficiency, and innovation. We're providing our businesses with a competitive edge by leveraging public cloud scale and enabling new infrastructure economics. As the Cloud Engineering Lead - Public Cloud Observability - SVP you will play a pivotal role in shaping and executing our public cloud strategy. You will be part of a team that continues to deliver big! From building cloud … at scale, all the way to enabling payments solutions, this team is at the forefront of innovation. What You'll Do Lead the Charge: own the Public Cloud Foundations - Observability strategy and its execution, enabling Citi's secure and enterprise-scale adoption of public cloud. You will provide technical authority for all foundational services. Build and Inspire: lead and grow … and a passion for engineering best practices. You have: Cloud Engineering Expertise: A deep understanding of public cloud services adoption at scale. Expert-level understanding of AWS/GCP Observability across: Proficiency in working with cloud-native APIs from AWS (e.g. AWS Config, CloudWatch) and GCP (e.g. Cloud Asset Inventory, Cloud Monitoring) Experience with Python to automate API integrations and More ❯
results that matter. By taking advantage of all structured and unstructured data - securing and protecting private information more effectively - Elastic's complete, cloud-based solutions for search, security, and observability help organizations deliver on the promise of AI. What Is The Role: You will have the opportunity to work with a tremendous services, engineering, product, and sales team and wear … many hats. This is a meaningful role, as a Consulting Engineer (Observability), you have an outstanding chance to create an immediate impact on the success of Elastic and our customers. As an Elastic Consulting Engineer, you will be working closely with our customers to provide technical solutions for their business use cases with the Elastic Stack (which includes Elasticsearch, Kibana … consultant will be focused on excellence, taking the initiative for self-improvement and possess great communication skills. Our customers' use cases extend across all the Elastic Solutions: Enterprise Search, Observability and Security, and beyond, and the scale of data in their environments ranges from gigabytes to petabytes. This diverse mix of a customer base means the challenges they face that More ❯
using modern agentic frameworks, tools and libraries like LangChain , Google ADK . Interface with A2A and MCP protocols Deploy to production using Docker, Terraform, GCP , with proper monitoring and observability Implement clean, tested Python code and maintain CI/CD pipelines Apply core data engineering tools: SQL, DBT, BigQuery What You Bring System thinker, good communicator, and strong work ethic. More ❯
deployment experience (AWS - multi-account, multi-region) - Terraform, Helm, CloudFormation - Kubernetes (EKS/OpenShift) and CI/CD (GitHub Actions/Argo CD) - Strong understanding of IAM, GuardDuty, and observability tooling - Ability to translate research into hardened, production-grade systems If you are interested in the above position, please contact me, James Chapman on or email me at (even if More ❯
and resolve application-level production incidents The Person: 5+ years in SRE, DevOps, or infrastructure engineering Strong experience with AWS, EKS/Kubernetes, and Terraform Familiar with Kafka and observability tools like Datadog or Grafana Able to troubleshoot issues across infrastructure and application layers Reference number: BBBH259300 To apply for this role or for to be considered for further roles More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Rise Technical Recruitment
and resolve application-level production incidents The Person: *5+ years in SRE, DevOps, or infrastructure engineering *Strong experience with AWS, EKS/Kubernetes, and Terraform *Familiar with Kafka and observability tools like Datadog or Grafana *Able to troubleshoot issues across infrastructure and application layers Reference number: BBBH(phone number removed) To apply for this role or for to be considered More ❯
Employment Type: Permanent
Salary: £80000 - £90000/annum 38 Days Holiday, Healthcare, Pension
London, South East, England, United Kingdom Hybrid / WFH Options
Rise Technical Recruitment Limited
and resolve application-level production incidents The Person: *5+ years in SRE, DevOps, or infrastructure engineering*Strong experience with AWS, EKS/Kubernetes, and Terraform*Familiar with Kafka and observability tools like Datadog or Grafana*Able to troubleshoot issues across infrastructure and application layers Reference number: BBBH259300 To apply for this role or for to be considered for further roles More ❯
and ability to drive clarity in ambiguous, complex technical situations. Leadership experience through mentoring, leading initiatives, or shaping engineering practices across teams. Experience in defining and improving DevOps pipelines , observability, and platform reliability. Strong communication skills and a collaborative mindset-able to build alignment across stakeholders. Proactive and pragmatic: able to balance technical excellence with delivery impact. More ❯
etc.) to convert business needs into robust technical solutions. Design and manage enterprise data platforms including data warehouses, data lakes, and semantic models to enable high-quality analytics. Implement observability and monitoring solutions across key data flows and system integrations. Provide technical mentorship and guidance to development teams, data engineers, and IT staff on architecture, technologies, and best practices. Experience More ❯
Nottingham, Nottinghamshire, East Midlands, United Kingdom Hybrid / WFH Options
Rebel Recruitment
resolve incidents, analysing logs, data, and reports from the service desk. Work closely with engineering leadership and product owners to prioritise incidents and drive preventative measures. Take ownership of observability strategiesmonitoring standards, alerting practices, and visibility improvements. Engage in Agile ceremonies and collaborate across disciplines to support efficient delivery and operational excellence. Why This Role Stands Out No immediate on More ❯
platforms into our BI warehousing environment. • Model data using Kimball/star schemas and data vault principles to support BI and self service analytics. • Implement data quality, lineage, and observability tooling (e.g., dbt tests, Great Expectations, Azure Purview). • Optimise storage and compute costs through partitioning, incremental loads, and automation. • Collaborate with DevOps to embed CI/CD and Infrastructure More ❯
be eligible for higher level clearance. Individuals without security clearance will be considered for other opportunities at Dynatrace. Why you will love being a Dynatracer Dynatrace leads in unified observability and security. We offer a culture of excellence with competitive compensation. Work with major cloud providers like AWS, Microsoft, and Google Cloud, forming strategic alliances. The platform uses cutting-edge More ❯
optimization, anomaly detection, and predictive analytics. Understanding of AI frameworks and libraries (e.g., TensorFlow, PyTorch, Scikit-learn) and their application in network automation and monitoring. Experience with telemetry and observability frameworks (e.g., Prometheus, Grafana) for real-time network monitoring and troubleshooting. Experience : Minimum of 7 years' of experience in network engineering, operations, and support. Proven ability to work hands-on More ❯
challenges at scale then this role is for you. Ideally you have several years experience using Go in production. You'll be comfortable with Docker, and familiar with modern observability tools such as Prometheus, Alert Manager, Grafana and X-Ray/Tempo/Jaeger. We're looking for 3+ years tackling hard backend problems Seasoned database experience - we use MySQL More ❯
over process and deliberation Great to haves Experience with .NET/C# Experience working in an agile development team with a focus on delivering value early Experience with building observability and alerting into systems Salary and benefits (the stuff you'd expect!) Salary is £78K - £100K (depending on experience) This is a full time opportunity, working Monday to Friday remotely More ❯