for a DevOps Engineer with strong site reliability principles to join our Platform team. You’ll focus on maintaining and improving production reliability, automating operational tasks, and enhancing our observability stack. You’ll work closely with SREs, support engineers, release managers, and incident managers to ensure our systems meet SLIs, SLOs, and SLA targets. Key Responsibilities Maintain and optimise production … Proficient with AWS services relevant to production workloads (EKS, EC2, RDS/Aurora, S3, IAM). Infrastructure as Code with Terraform and configuration management with Ansible. Strong experience with observability tools (Grafana, Prometheus, Loki, Tempo). Understanding of SRE concepts (SLIs, SLOs, error budgets, capacity planning). Comfortable working in incident and problem management processes. Strong GitOps mindset for managing More ❯
bulk of our codebase, currently in Java (11+), and ideally Spring Boot. You will be working with SQL and large SQL databases, Docker, Kubernetes, OpenAPI specifications, and distributed system observability tooling (e.g., Datadog APM). Infrastructure automation is primarily owned by the infrastructure team, but you will be a consumer of their work; familiarity with AWS, Terraform and Docker is … Ability to communicate effectively with technical and non-technical stakeholders Modern Cloud-Native architectures and practices (high availability, high scalability, microservices, 12-factor apps, CI/CD, automation and observability) TDD, BDD and Contract testing Experience in a DevOps environment or willingness to work in one Proven delivery of well-tested, scalable, fault-tolerant and performant solutions A pragmatic, self More ❯
in Computer Science, Electrical Engineering or related field Ability to develop and maintain comprehensive monitoring, alerting systems and incident management using tools such as Prometheus, Grafana, OTEL and other observability stacks Ability to optimize, scale, and secure our infrastructure and Kubernetes environments, using deep Kubernetes and cloud platform experience Ability to Implement and maintain network policies and security practices to … native technologies, such as AWS, GCP, or Azure Experience with BGP ECMP, including its configuration and troubleshooting Experience with developing and maintaining eBPF programs for security, network monitoring, and observability Salary Range = 160000 - 240000 USD Annually + Benefits + Bonus The referenced salary range is based on the Company's good faith belief at the time of posting. Actual compensation More ❯
for leading and executing the migration of data, dashboards, alerts, and configurations from Splunk systems to Elasticsearch. This role involves deep technical expertise in Splunk architecture, data ingestion, and observability tools, along with strong project management and stakeholder communication skills. Must have skills: -Splunk -ELK Stack -Kibana Nice to have skills: -stakeholder communication skills -strong project management More ❯
technical proficiency in: Languages: Java 17+ (Java 21 preferred) Frameworks: Micronaut (preferred), Spring Boot Testing: JUnit, Mockito Build Tools: Gradle Data & Messaging: Kafka, MongoDB APIs: GraphQL Federation, REST Infrastructure & Observability: Terraform, OpenTelemetry, Dynatrace Please get in touch asap for a chance to work on this amazing project. More ❯
MY client are transforming observability with a modern, full-stack platform that delivers logs, metrics, traces, and security monitoring — cutting costs by up to 70% while boosting efficiency. They are looking for a Lead SRE to own and elevate our Alerting & Incident Management platform . You’ll be the driving force behind reliability, customer satisfaction, and product excellence — ensuring smooth More ❯
MY client are transforming observability with a modern, full-stack platform that delivers logs, metrics, traces, and security monitoring — cutting costs by up to 70% while boosting efficiency. They are looking for a Lead SRE to own and elevate our Alerting & Incident Management platform . You’ll be the driving force behind reliability, customer satisfaction, and product excellence — ensuring smooth More ❯
s (ISD) Enterprise Technology & Tools Sector; Asset Services Team. This team inside of ISD provides enterprise IT services to the Laboratory community including the configuration management database and service observability & alerting. Required Skills: Must understand basic IT System Administrator processes & terminology. Data Analysis Reviewing Specification Requirements Translates Business Requirements into Technical Specifications Understands and Executes Project Requirements Agile Software Development More ❯
skills, testing experience (React Testing Library), and familiarity with Tailwind CSS are essential. Nice-to-haves include Storybook component library work, Cloudflare Workers, E2E testing (Cypress or Playwright), and observability practices. Benefits Salary up to £65,000 25 days holiday + bank holidays + birthday off Friendly, forward-thinking team culture Agile workflows with real influence over technical direction Regular More ❯
Reading, Berkshire, United Kingdom Hybrid / WFH Options
Spectrum IT Recruitment
Performance review of the platform and tooling Own the architecture, performance, and cost-efficiency of the infrastructure Align operations with core e-commerce flows and system performance metrics Drive observability improvements across New Relic and Rollbar Enhance website performance, eliminate slowdowns, and optimise customer experience Review and optimise configurations for security and speed Maintain high standards of cloud security and More ❯
practices in software development and deployment Implement best practice coding in relation to Development coding standards Provides direction and technical context for more junior developers Fosters a culture of observability across the engineering team. Helps teams across engineering use operational data to improve stability and performance of their applications. Maintain documentation and release notes Awareness of application security considerations Leads More ❯
and platforms to automate and optimise data management steps and gateways into data and analytical pipelines. Expertise in implementing and managing statistical process controls for data quality measurement, continuous observability, and data quality remediation. Strong collaboration skills, with experience working across diverse data roles and organisational levels within a federated data driven organisation. Excellent communication skills, both written and verbal More ❯
and know your way around Node.js backend frameworks You have solid experience designing and maintaining APIs , background workers, or async processing systems You have experience with performance optimization and observability You're comfortable working with infra basics (Docker, GCP, CI/CD) You care about code quality and testing What we offer Monthly subsidy programme: Different people have different needs More ❯
Glasgow, Lanarkshire, Scotland, United Kingdom Hybrid / WFH Options
Circle Recruitment
solid understanding of quality engineering principles. Ability to work autonomously and collaboratively in a fast-paced, cross-functional environment. A holistic view of quality , considering everything from testability and observability to scalability and resilience. Ideal Background Degree in Computer Science, Engineering, or a related field. Proven experience in quality engineering roles with a focus on continuous improvement and cross-team More ❯
Manchester, Lancashire, England, United Kingdom Hybrid / WFH Options
Circle Recruitment
solid understanding of quality engineering principles. Ability to work autonomously and collaboratively in a fast-paced, cross-functional environment. A holistic view of quality , considering everything from testability and observability to scalability and resilience. Ideal Background Degree in Computer Science, Engineering, or a related field. Proven experience in quality engineering roles with a focus on continuous improvement and cross-team More ❯
Cardiff, South Glamorgan, Wales, United Kingdom Hybrid / WFH Options
Circle Recruitment
solid understanding of quality engineering principles. Ability to work autonomously and collaboratively in a fast-paced, cross-functional environment. A holistic view of quality , considering everything from testability and observability to scalability and resilience. Ideal Background Degree in Computer Science, Engineering, or a related field. Proven experience in quality engineering roles with a focus on continuous improvement and cross-team More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
Circle Recruitment
solid understanding of quality engineering principles. Ability to work autonomously and collaboratively in a fast-paced, cross-functional environment. A holistic view of quality , considering everything from testability and observability to scalability and resilience. Ideal Background Degree in Computer Science, Engineering, or a related field. Proven experience in quality engineering roles with a focus on continuous improvement and cross-team More ❯
Newcastle Upon Tyne, Tyne and Wear, North East, United Kingdom Hybrid / WFH Options
The Bridge (IT Recruitment) Limited
Technology Platform Delivery Oversee the delivery and lifecycle management of: Microsoft 365 and collaboration platforms Cloud platforms (design, automation, cost optimisation) Network and security operations (compliance, threat management) Monitoring, observability, and backup/recovery systems Ensure alignment with architectural standards and regulatory requirements (e.g., DORA, Cyber Essentials Plus). Stakeholder Engagement Act as the escalation point for unresolved issues across More ❯
and the adoption of new standards and protocols within the payment ecosystem Collaborate on Process & Tooling Automation: Work closely together with internal teams, including Merchant Data Analytics and Merchant Observability, to design, develop, and implement standardized, repeatable workflows, robust monitoring systems, and automated tools Who you are: 5+ years of experience in payments, acquiring, or card networks, demonstrating a unique More ❯
Databases (Mongo) Test automation following Test Driven Development Practices including Unit, Integration and end-to-end testing Supporting a highly-available production system, diagnosing issues raised from logs and observability tooling (Dynatrace), triage and resolution. Company Benefits A Competitive Salary, Pension Scheme and Life Assurance Along with 25 Days Annual Leave plus an Additional Day on us for your Birthday More ❯
release strategy balancing risk and speed of delivery Assisting the team with support process and incident management Pairing with other team members and encourage a focus on quality and observability in the team Working closely with the Product Owner and Delivery Lead on prioritizing product features and uncovering edge cases Managing the issue backlog and facilitating bug triage based on More ❯
Advisor, and NLP-driven analytics. Background with BI tools such as Tableau, Domo, or Sisense. Web development experience using HTML, CSS, JavaScript, Python. Understanding of data governance, lineage, and observability tools. Familiarity with AI model monitoring, interpretability, and ethical AI. Knowledge of operational resilience and compliance frameworks such as DORA. Salary Range = 160000 - 220000 USD Annually + Benefits + Bonus More ❯
mesh, API gateways, and commercial vs. open source software. Approaches to managing Architectural debt, Architecture governance and evolution in practice Micro services topologies, including operational concerns such as resiliency, observability, discovery and routing, security etc. Have experience with, and understand how to lead, legacy integration and remediation (facades, strangler approaches, et. al.). Deep understanding of different integration patterns and More ❯
mesh, API gateways, and commercial vs. open source software. Approaches to managing Architectural debt, Architecture governance and evolution in practice Micro services topologies, including operational concerns such as resiliency, observability, discovery and routing, security etc. Have experience with, and understand how to lead, legacy integration and remediation (facades, strangler approaches, et. al.). Deep understanding of different integration patterns and More ❯