Reliability Engineer (SRE) to ensure the availability, performance, and cost-effectiveness of our cloud and on-prem services. You will collaborate with software engineers and system administrators to improve observability, reduce downtime, and proactively mitigate reliability risks across a growing portfolio of services. Key Responsibilities of the Site Reliability Engineer: Improve reliability and performance across multiple subsystems. Automate manual tasks … Linux administration, scripting, and network security protocols. Experience with cloud services (preferably AWS – EC2, RDS, S3, Lambda). Desirable: Experience coding in Java, Go, or Python; cross-domain technologies; observability patterns; and service management environments. Why Join TwinStream? Salary: £65,000–£95,000 (DOE & clearance level) Pension: 8% employer contribution Private Healthcare: Includes dental & optical cover for you & your family More ❯
Bristol, Avon, South West, United Kingdom Hybrid / WFH Options
Twinstream Limited
Reliability Engineer (SRE) to ensure the availability, performance, and cost-effectiveness of our cloud and on-prem services. You will collaborate with software engineers and system administrators to improve observability, reduce downtime, and proactively mitigate reliability risks across a growing portfolio of services. Key Responsibilities of the Site Reliability Engineer: Improve reliability and performance across multiple subsystems. Automate manual tasks … Linux administration, scripting, and network security protocols. Experience with cloud services (preferably AWS – EC2, RDS, S3, Lambda). Desirable: Experience coding in Java, Go, or Python; cross-domain technologies; observability patterns; and service management environments. Why Join TwinStream? Salary: £65,000–£95,000 (DOE & clearance level) Pension: 8% employer contribution Private Healthcare: Includes dental & optical cover for you & your family More ❯
Software Maintenance: Ensuring the deployed software product is configured and maintained in an automated fashion throughout its lifecycle. • Security Integration: Embedding security practices into the development and deployment processes. • Observability: Implementing monitoring and logging to ensure the software's performance and security can be observed and analyzed. • Collaboration: Working closely with development, operations, and security teams to streamline workflows and … CI/CD (GitLab/Pipelines), IaC (Terraform), Kubernetes, Istio. • Advanced troubleshooting without guidance across the stack (networking, DNS, TLS, authn/z, storage, runtime); strong root-cause analysis. • Observability first: metrics/tracing/logs (Prometheus/Thanos/Grafana, OpenTelemetry); defines SLOs, alerts, runbooks. • Security built-in: image scanning (Trivy/Anchore), policy-as-code, secrets management, supply More ❯
provide solutions. Mentor junior team members, providing guidance on standard methodologies for DevOps. Infrastructure Automation & Management: Use Terraform/OpenTofu and automation frameworks to provision and manage infrastructure. Monitoring & Observability: Configure and utilise observability tools like Datadog for performance monitoring, alerting, and visualisation, ensuring system reliability and quick identification of issues. Performance Optimisation: Continuously monitor the performance of the tools More ❯
databases, file storage protocols (NFS, SMB), and backup strategies are useful. Experience with CI/CD pipelines, tagging for cost optimisation, and security compliance is key. Familiarity with DevOps, observability tools, and performance tuning enhances delivery. Strong understanding of business process flows, integration patterns, and stakeholder communication ensures successful implementation and support. More ❯
performant search and indexing systems, particularly for granularly permissioned or complex data structures. Proficiency in designing and implementing tools for cluster deployment, operations, and lifecycle management. Strong knowledge of observability, telemetry, and diagnostic tools for monitoring Elasticsearch clusters and ensuring system reliability. Hands-on experience leading major version upgrades, implementing CVE (Common Vulnerabilities and Exposures) remediation strategies, and maintaining secure More ❯
data modeling (star schema), SQL, Python, and data governance tools (e.g., Purview, Unity Catalog). Experience implementing AI/ML solutions in Databricks or similar platforms. Knowledge of data observability, monitoring, and incident management (ITIL best practices). Excellent communication and stakeholder management skills. Experience in a regulated financial environment is desirable. Relevant certifications such as Azure Data Engineer Associate More ❯
team meetings and performance reviews. Motivate, guide and coach team members to hit agreed targets via formal objectives and supporting development plans. Incident & Problem Management Proactively partner with the Observability Management function to establish trending and opportunities for Customer infrastructure optimisation. Be a process manager and advocate for Problem Management, ensuring root cause analysis takes place on major Incidents and More ❯
strategic way, with continuous improvement of solutions. Experience with configuration and change management, incident/problem resolution, and evolving endpoint solutions using modern infrastructure standards and practices including automation, observability, and continuous deployment. Experience working with architects and project managers to agree enterprise-wide designs and implement across central and multi-region estates. We will provide The opportunity to be More ❯
client satisfaction. Collaborating with Client Solutions and other teams to understand requirements and deliver tailored solutions. Designing and implementing scalable, future-proof architectures for new connectors and integrations. Enhancing observability with better diagnostics, logging, and tracing to support technical teams. Overseeing the development and management of the public API (REST + event streaming functionality). Producing clear, accessible technical documentation More ❯
interest in learning them. Bonus points Prior experience integrating with UK and European healthcare systems (e.g., EMIS, TPP SystmOne, Cerner Millennium, GDT, Dedalus, Maincare, Epic, etc.). Knowledge of observability tools (logging, metrics, tracing) and performance tuning. Familiarity with GDPR, NHS DSP Toolkit, and healthcare security best practices . Attitude matters more than experience! If you are motivated, collaborative, and More ❯
Shopify, WooCommerce, HubSpot, Salesforce, and marketing tools to unify data and operations. Enhance traditional automation with AI-powered tasks, including contextual data extraction, summarization, classification, and content generation. Ensure observability and reliability through comprehensive logging, tracing, and performance monitoring. Agentic System Development Build and deploy LLM and agent-based systems for customer interaction, product enrichment, and decision-making. Implement RAG More ❯
implementing complex data solutions across areas such as data integration, data modelling, data management and governance, data Lake/Lakehouse/Data Mesh, data engineering, analytics, AI/GenAI, observability, cloud, security. Experience aligning data architecture blueprints across business units and geographies; presenting designs to stakeholders (e.g., Architecture Boards). Experience in governance, regulatory compliance (e.g., GDPR) and managing large More ❯
implementing complex data solutions across areas such as data integration, data modelling, data management and governance, data Lake/Lakehouse/Data Mesh, data engineering, analytics, AI/GenAI, observability, cloud, security. Experience aligning data architecture blueprints across business units and geographies; presenting designs to stakeholders (e.g., Architecture Boards). Experience in governance, regulatory compliance (e.g., GDPR) and managing large More ❯
Bristol, Gloucestershire, United Kingdom Hybrid / WFH Options
Deloitte LLP
implementing complex data solutions across areas such as data integration, data modelling, data management and governance, data Lake/Lakehouse/Data Mesh, data engineering, analytics, AI/GenAI, observability, cloud, security. Experience aligning data architecture blueprints across business units and geographies; presenting designs to stakeholders (e.g., Architecture Boards). Experience in governance, regulatory compliance (e.g., GDPR) and managing large More ❯
Manchester, Lancashire, United Kingdom Hybrid / WFH Options
Deloitte LLP
implementing complex data solutions across areas such as data integration, data modelling, data management and governance, data Lake/Lakehouse/Data Mesh, data engineering, analytics, AI/GenAI, observability, cloud, security. Experience aligning data architecture blueprints across business units and geographies; presenting designs to stakeholders (e.g., Architecture Boards). Experience in governance, regulatory compliance (e.g., GDPR) and managing large More ❯
Cardiff, South Glamorgan, United Kingdom Hybrid / WFH Options
Deloitte LLP
implementing complex data solutions across areas such as data integration, data modelling, data management and governance, data Lake/Lakehouse/Data Mesh, data engineering, analytics, AI/GenAI, observability, cloud, security. Experience aligning data architecture blueprints across business units and geographies; presenting designs to stakeholders (e.g., Architecture Boards). Experience in governance, regulatory compliance (e.g., GDPR) and managing large More ❯
infrastructure and system issues, as well as log ingestion and communication issues. Design and develop scalable, robust, and high-performance data pipelines and data storage solutions. Develop and maintain observability frameworks using tools like Kibana, Grafana, or similar Work with cross-functional teams to define observability and search requirements. Scale, script and maintain our development and production platform foundation with More ❯
for a DevOps Engineer with strong site reliability principles to join our Platform team. You’ll focus on maintaining and improving production reliability, automating operational tasks, and enhancing our observability stack. You’ll work closely with SREs, support engineers, release managers, and incident managers to ensure our systems meet SLIs, SLOs, and SLA targets. Key Responsibilities Maintain and optimise production … Proficient with AWS services relevant to production workloads (EKS, EC2, RDS/Aurora, S3, IAM). Infrastructure as Code with Terraform and configuration management with Ansible. Strong experience with observability tools (Grafana, Prometheus, Loki, Tempo). Understanding of SRE concepts (SLIs, SLOs, error budgets, capacity planning). Comfortable working in incident and problem management processes. Strong GitOps mindset for managing More ❯
for a DevOps Engineer with strong site reliability principles to join our Platform team. You’ll focus on maintaining and improving production reliability, automating operational tasks, and enhancing our observability stack. You’ll work closely with SREs, support engineers, release managers, and incident managers to ensure our systems meet SLIs, SLOs, and SLA targets. Key Responsibilities Maintain and optimise production … Proficient with AWS services relevant to production workloads (EKS, EC2, RDS/Aurora, S3, IAM). Infrastructure as Code with Terraform and configuration management with Ansible. Strong experience with observability tools (Grafana, Prometheus, Loki, Tempo). Understanding of SRE concepts (SLIs, SLOs, error budgets, capacity planning). Comfortable working in incident and problem management processes. Strong GitOps mindset for managing More ❯
for a DevOps Engineer with strong site reliability principles to join our Platform team. You’ll focus on maintaining and improving production reliability, automating operational tasks, and enhancing our observability stack. You’ll work closely with SREs, support engineers, release managers, and incident managers to ensure our systems meet SLIs, SLOs, and SLA targets. Key Responsibilities Maintain and optimise production … Proficient with AWS services relevant to production workloads (EKS, EC2, RDS/Aurora, S3, IAM). Infrastructure as Code with Terraform and configuration management with Ansible. Strong experience with observability tools (Grafana, Prometheus, Loki, Tempo). Understanding of SRE concepts (SLIs, SLOs, error budgets, capacity planning). Comfortable working in incident and problem management processes. Strong GitOps mindset for managing More ❯
bulk of our codebase, currently in Java (11+), and ideally Spring Boot. You will be working with SQL and large SQL databases, Docker, Kubernetes, OpenAPI specifications, and distributed system observability tooling (e.g., Datadog APM). Infrastructure automation is primarily owned by the infrastructure team, but you will be a consumer of their work; familiarity with AWS, Terraform and Docker is … Ability to communicate effectively with technical and non-technical stakeholders Modern Cloud-Native architectures and practices (high availability, high scalability, microservices, 12-factor apps, CI/CD, automation and observability) TDD, BDD and Contract testing Experience in a DevOps environment or willingness to work in one Proven delivery of well-tested, scalable, fault-tolerant and performant solutions A pragmatic, self More ❯
in Computer Science, Electrical Engineering or related field Ability to develop and maintain comprehensive monitoring, alerting systems and incident management using tools such as Prometheus, Grafana, OTEL and other observability stacks Ability to optimize, scale, and secure our infrastructure and Kubernetes environments, using deep Kubernetes and cloud platform experience Ability to Implement and maintain network policies and security practices to … native technologies, such as AWS, GCP, or Azure Experience with BGP ECMP, including its configuration and troubleshooting Experience with developing and maintaining eBPF programs for security, network monitoring, and observability Salary Range = 160000 - 240000 USD Annually + Benefits + Bonus The referenced salary range is based on the Company's good faith belief at the time of posting. Actual compensation More ❯
for leading and executing the migration of data, dashboards, alerts, and configurations from Splunk systems to Elasticsearch. This role involves deep technical expertise in Splunk architecture, data ingestion, and observability tools, along with strong project management and stakeholder communication skills. Must have skills: -Splunk -ELK Stack -Kibana Nice to have skills: -stakeholder communication skills -strong project management More ❯
technical proficiency in: Languages: Java 17+ (Java 21 preferred) Frameworks: Micronaut (preferred), Spring Boot Testing: JUnit, Mockito Build Tools: Gradle Data & Messaging: Kafka, MongoDB APIs: GraphQL Federation, REST Infrastructure & Observability: Terraform, OpenTelemetry, Dynatrace Please get in touch asap for a chance to work on this amazing project. More ❯