and help us maintain our hosting platform. Creating and improving routes to live with automation including blue/green & canary strategies. Configure and improve observability controls. Proving scalability/resilience and security controls. Sustain and improve the process of knowledge sharing throughout the engineering teams. About us… Like the modern … experience configuring & running production workloads in Kubernetes CI/CD & IaC tools like Jenkins, Terraform, Sonar, Nexus, Git, Spinnaker, Harness Strong understanding & experience of Observability, SRE, DevSecOps & FinOps Good understanding of cloud networking & connectivity patterns Good understanding of key data tooling such as Kafka, BigTable, DataProc, BigQuery etc.. It would more »
/SRE team. The Lead Site Reliability Engineer will lead the charge in selecting, configuring, and supporting Cloud Platform components and tooling. Proficiency in observability tech such as Grafana and Prometheus is essential. An ability to self-manage in both Agile and traditional delivery approaches is a key asset The … will be paramount for collaborating with stakeholders and mentoring team members. Key Skills Experience with GCP, AWS or Azure Leadership/management experience Terraform Observability tech such as Grafana/Prometheus Background in software engineering is an advantage If you are interested in the role please apply! We are an more »
Winchester, Hampshire, United Kingdom Hybrid / WFH Options
Context Recruitment
work closely with cross-functional teams to build and maintain a robust infrastructure that supports their dynamic needs. Key Responsibilities: Assume responsibility for the observability suite, encompassing tools for monitoring, logging, and alerting, to guarantee a thorough and integrated understanding of system functionality and health. Set up and oversee APM … Experience in a DevOps/Site Reliability Engineer ( SRE ) position, dedicated to ensuring the high availability, reliability, and scalability of live systems. Proficient in observability tools like Prometheus, ELK stack, Grafana, and Azure Monitor, capable of fully managing the suite for optimal system oversight. Skilled in operating APM tools such more »
slicing. Develop and maintain backend services using Node.js. Utilize Linux, AWS, serverless technologies, message queues, relational and NoSQL databases, containers, Infrastructure as Code, and observability tools in development processes. Advocate for and practice Test-Driven Development (TDD) and paired programming. Embrace lean and agile methodologies, ensuring the delivery of small … public-facing APIs and microservice architectures. Familiarity with AWS or GCP, Docker, Linux, and C# is desirable. Experience with CI/CD pipelines, scripting, observability, and data engineering is a plus. Understanding and advocacy of lean software development principles. Personal Attributes: Passionate about software development with a focus on quality. more »
of your data products and work with the QA team to implement automated quality processes to measure data correctness and report outcomes through our observability infrastructure · Implement data governance practices and ensure compliance with ITV data privacy and security regulations. Establish data access controls, encryption mechanisms, and data retention policies. … Execute and comply with ITV architecture governance processes. · Ensure all data products conform to all observability requirements and suitable dashboards are in place. · Work closely with cross-functional teams, including data scientists, analysts, and software engineers, to understand their requirements and provide data engineering support. Document data pipelines, workflows, data more »
all the pieces of the puzzle that fit together to make a secure, repeatable and scalable continuous delivery pipeline. Everything from secrets management to observability, you’ve got a go-to toolchain that you have proven to work all the way through to production. Pragmatic & Versatile Architecture Skills: You’re … looking for specific experience of administrating and provisioning Grafana Loki, and/or the LGTM stack or similar Infrastructure automation with Terraform or CloudFormation. Observability and the associated toolchain and techniques Containers and container orchestrators: Kubernetes, Istio Preferably experienced using OpenTelemetry and tools like Honeycomb History of client facing or more »
Cilium as a robust Container Network Interface (CNI) solution. You'll collaborate closely with DevOps teams, network architects, and security professionals to enhance network observability, security, and performance. Key Responsibilities Deployment and Configuration: Deploy Cilium as a DaemonSet into Kubernetes clusters. Configure Cilium to leverage eBPF for efficient packet processing … network policy enforcement, and observability. Ensure seamless integration with existing CNI solutions. Network Observability: Utilize Hubble (Cilium's observability companion) to monitor network activities in Real Time. Leverage eBPF to gain insights into network flows, security policies, and process behavior within Kubernetes workloads. Performance Tuning: Optimize Cilium's performance by … Cilium and eBPF in Kubernetes environments. Technical Skills: Proficiency in Kubernetes networking concepts. Strong understanding of eBPF technology and its applications. Familiarity with network observability tools (eg, Hubble). Scripting skills (eg, Python, Go) for automation. Certifications: Cilium Certified Engineer (CCE) or equivalent certification is a plus. Collaboration: Excellent communication more »
West London, London, United Kingdom Hybrid / WFH Options
Daniel James Resourcing Ltd
two. Demonstrates a keen understanding of AWS and other cloud costs, attributing them to specific teams and services. Possesses extensive knowledge and experience in observability, including best practices, implementations, and familiarity with observability vendors. Champions diversity and inclusion, fostering a culture of innovation, teamwork, and self-improvement. Leads by example more »
to join their London technology team. This is a new role for the team and they are looking to hire someone with very strong observability and monitoring skills. The technology stack is rapidly evolving across the company and all of the infrastructure and application stack is now being built using … Microservices architecture. The successful Production Engineer will be embedded within some of the core development teams, finding new ways to improve monitoring and observability workflows using tools like Prometheus and Grafana. This role will involve direct interaction with traders, portfolio managers and senior stakeholders across the business so strong leadership … and interpersonal skills are required for this position. Key Requirements: Expert scripting skills with Powershell, Bash, Python etc. In depth knowledge of monitoring & observability tools such as Promethus, Grafana and OpenTelemetry Strong knowledge of CI/CD tooling Experience with metrics and tracing instrumentation, such as LGTM and PromQL Knowledge more »
Reigate, Surrey, South East, United Kingdom Hybrid / WFH Options
Client Server
collaborate across product focussed Agile engineering teams to ensure the reliability, availability and performance of client facing services. Responsibilities will include managing and configuring observability platforms such as DataDog and PagerDuty to provide proactive monitoring of production (and other) environments, design and implementation of automation processes to drive efficiencies, leading … a similar SRE/Site Reliability Engineer position You have experience of running 24x7 services in the public cloud - Azure You have experience with observability tools such as DataDog and PagerDuty You have a good knowledge of Containerisation - Kubernetes, AKS You have strong scripting skills for automation, PowerShell or Python more »
About the job : We are seeking a dynamic and experienced Observability Engineer with expertise in any cloud, Grafana/Prometheus/Datadog Role & Responsibilities * Develop and improve instrumentation for monitoring and logging the health and availability of services. * Proactively monitor systems, networks, and applications to provide input in improving the more »
Principal Engineer - AI/Data/Observability We are currently seeking a Principal Engineer for an exciting startup in the observability space who recently emerged from Stealth mode -> the founder sold their former APM for $500million a few years ago. Focusing on OpenTelemtry, this business has enough runway for … for their first AI & Data expert to come onboard. You will work as part of an empowered product team focused on making the best observability tool for their end users and as a Principal Engineer, you would be responsible for shipping machine-learning powered technology to production and working across … 5+ Years' exp in Machine Learning - preferably in a Tech startup ->> Strong Engineering background ->> Extensive experience shipping ML technology to production ->> Good understanding of Observability ->> Experience operating Software in production ->> Expert-level Python experience ->> Knowledge of LLMs Please apply ASAP for more info - experience in fast-paced startup is highly more »
Network Consultant Life on the team A fantastic opportunity has arisen to join our dynamic and rapidly expanding Consultancy Practice within Computacenter. The ideal candidate will have experience of working in a consultancy position with focus on traditional, advanced and more »
the system. Consulting/Coaching experience in implementing new ways of working and enabling agile delivery transformation. Enabling continuous delivery while ensuring reliability, quality, observability, and performance. Understanding of build and deployment pipelines, test driven development, automated testing, Test data management, automated Environment provisioning, Version control, Monitoring and alerting and … DevOps enablers. Understanding of observability and monitoring platform; Experience of having collaborated with developers to implement and improve observability and monitoring practices is preferred. Experience in leveraging DORA framework to effectively improve the performance of DevOps teams - Desired. Experience in defining OKRs/KPIs, setting up process/systems to more »
Head of Engineering Practices | Global Trading Firm | Software | DevOps | London City | Hybrid | up to £140k + Bonus, Benefits ❗Note: This role does not provide Visa Sponsorship.❗ Our client is a leading financial institution with a global presence, providing a wide more »
to calls, ultimately enhancing our service to its customers. Qualifications: 5+ years of experience in sales-oriented role MUST have sold a DevOps or Observability tool/product. (Observability, CloudNative, Kubernetes, CICD, DevSecOps) Exposure to customer service. Eagerness and aptitude for quickly grasping technical concepts. Aspiration to build a successful more »
Monitoring and Observability Engineer Salary - £50,000 - £55,000 - Fully remote role! Principal Accountabilities Design, implement, and manage monitoring solutions to ensure the availability, performance, and reliability of our systems. Collaborate with cross-functional teams to understand system requirements and implement effective monitoring strategies. Utilise expertise in Logic Monitor, OpenSearch more »
Your primary focus will be on our edge-computing stack, which includes building edge applications, deploying machine learning models, optimizing platform runtime, and enhancing observability and telemetry. Responsibilities Include: Developing edge applications for processing vision data and communication layers for compute-constrained edge devices. Deploying machine learning models into production … environments. Optimising platform runtime for maximum performance, predominantly in C++ with GPU utilization. Building observability and telemetry mechanisms. Requirements: Minimum 3+ years of experience in C++ development for production software. Proficiency in building applications processing real-time data and optimizing for latency and memory. Experience with various profiling tools (e.g. more »
resilient and high performing to meet the evolving needs of the ParentPay group and will work in partnership with the IT Engineering , Monitoring and Observability, IT support, Application Support Service Ops teams ensuring systems, services & infrastructure work reliably and securely. Key Responsibilities Line management responsibilities for the team; providing support … personal development plans and undertaking 1-2-1 reviews. Working closely with the Monitoring and Observability, IT Support and Engineering teams to triage all infrastructure tickets and escalate to the Engineering team when SME resource is required. Develop the skills and competence within the infrastructure team to fulfil business requirements. more »
Able to travel via auto, train or air up to 70% of the time About Sifflet We are building the world’s best data observability platform to help companies excel at data-driven decision-making. Today half of a data team’s time is spent troubleshooting data quality issues, Sifflet … Our goal is to bring the same benefits to data teams. In a few years, every data-driven company will be using a data observability solution, and we want to be the best solution on the market (and of course, we have plans to go well beyond simple “data observabilitymore »