systems using load balancing, auto-scaling, canary releases, and blue-green deployments. Develop and maintain monitoring and logging dashboards with tools like New Relic, Prometheus, Grafana, and Datadog, ensuring observability through metrics, tracing, log aggregation, and alerting. Help teams determine settings and thresholds for alerts and automations based on application performance requirements. Monitor, optimize, and ensure system reliability and performance … like Terraform. Strong understanding of scalability, high availability patterns, and DevOps metrics such as DORA. Knowledge of SLM metrics (SLAs, SLOs, SLIs) and their application. Experience with monitoring and observability tools like New Relic, Prometheus, Grafana, and Datadog. Experience working with Kafka and improving performance in event-driven, real-time data architectures. Familiarity with cloud providers like AWS, Azure, or … GCP. Experience with CI/CD tools such as GitHub Actions, Jenkins, or GitLab CI. Strong analytical and communication skills. Nice-to-haves Familiarity with Observability-as-Code tooling and practices. Knowledge of Chaos Engineering practices. Senior Level: Mid-Senior, Employment: Full-time, Industry: Software Development #J-18808-Ljbffr More ❯
Hereford, Herefordshire, West Midlands, United Kingdom Hybrid / WFH Options
Twinstream Limited
Work Scheme Key Responsibilities of the Site Reliability Engineer: Partner with developers to improve performance and reliability across systems Automate toil and reduce unnecessary alerts with smart tooling Evolve observability so we can prevent issues before they become incidents Improve CI/CD pipelines and support development teams in delivering quality faster Explore new technologies, tools, and services that improve … plus) Experience with Terraform and modern IaC practices Hands-on with Docker and orchestration tools (Kubernetes, OpenShift, or Docker Swarm) CI/CD experience (Jenkins or equivalent) Monitoring/observability tools: Grafana , Prometheus , or InfluxDB Event-driven messaging: RabbitMQ or similar Strong Linux skills, scripting, and understanding of network security protocols Experience with AWS: EC2, S3, RDS, Lambda Desirable: Experience … coding in Python, Java, or Go Exposure to cross-domain solutions Experience in a service management environment Observability best practices and metric-driven reliability improvement Security Requirements Due to the sensitive nature of our work, candidates must be eligible for Developed Vetting (DV) clearance. All offers are subject to security screening. Ready to Engineer Systems That Matter? If youre a More ❯
Chesterfield, England, United Kingdom Hybrid / WFH Options
JR United Kingdom
automation and internal tools for deployment, monitoring, and incident response Tune performance across OS, network, and cloud layers — this role is hands-on and detail-oriented Improve system resilience, observability, and security in a high-stakes production environment Requirements: Fluent in Linux — not just using it, but understanding how it works under the hood Advanced terminal skills — manipulating systems efficiently … time environments Hands-on with Docker (Kubernetes is a plus), infrastructure-as-code, and CI/CD tooling Strong scripting and automation experience in Python and Bash Familiarity with observability stacks (Prometheus, OpenTelemetry, eBPF) Cloud infrastructure experience (AWS/GCP/Azure), with attention to IAM and software supply chain security Curious, persistent, and comfortable experimenting at the lowest levels More ❯
AWS in a production environment Expertise in Kubernetes including AKS EKS containerization and Helm Proven ability to meet and maintain SOC 2 or equivalent compliance Strong background in automation observability and GitOps workflows Comfortable using AI coding tools like GitHub Copilot Cursor or Claude to enhance delivery Bonus if you have experience supporting hybrid or disconnected deployment environments or working … Be Using Cloud : Azure including AKS API Management and DevOps Pipelines and AWS including EKS Lambda and CloudFormation Infrastructure as Code and GitOps : Terraform Bicep Pulumi ArgoCD and FluxCD Observability : Prometheus Grafana OpenTelemetry and Datadog Security and Compliance : HashiCorp Vault Azure Key Vault AWS KMS OPA Gatekeeper and Drata or similar ? Interested in exploring this further This is a high More ❯
AWS in a production environment Expertise in Kubernetes including AKS EKS containerization and Helm Proven ability to meet and maintain SOC 2 or equivalent compliance Strong background in automation observability and GitOps workflows Comfortable using AI coding tools like GitHub Copilot Cursor or Claude to enhance delivery Bonus if you have experience supporting hybrid or disconnected deployment environments or working … Be Using Cloud : Azure including AKS API Management and DevOps Pipelines and AWS including EKS Lambda and CloudFormation Infrastructure as Code and GitOps : Terraform Bicep Pulumi ArgoCD and FluxCD Observability : Prometheus Grafana OpenTelemetry and Datadog Security and Compliance : HashiCorp Vault Azure Key Vault AWS KMS OPA Gatekeeper and Drata or similar ? Interested in exploring this further This is a high More ❯
AWS in a production environment Expertise in Kubernetes including AKS EKS containerization and Helm Proven ability to meet and maintain SOC 2 or equivalent compliance Strong background in automation observability and GitOps workflows Comfortable using AI coding tools like GitHub Copilot Cursor or Claude to enhance delivery Bonus if you have experience supporting hybrid or disconnected deployment environments or working … Be Using Cloud : Azure including AKS API Management and DevOps Pipelines and AWS including EKS Lambda and CloudFormation Infrastructure as Code and GitOps : Terraform Bicep Pulumi ArgoCD and FluxCD Observability : Prometheus Grafana OpenTelemetry and Datadog Security and Compliance : HashiCorp Vault Azure Key Vault AWS KMS OPA Gatekeeper and Drata or similar ? Interested in exploring this further This is a high More ❯
AWS in a production environment Expertise in Kubernetes including AKS EKS containerization and Helm Proven ability to meet and maintain SOC 2 or equivalent compliance Strong background in automation observability and GitOps workflows Comfortable using AI coding tools like GitHub Copilot Cursor or Claude to enhance delivery Bonus if you have experience supporting hybrid or disconnected deployment environments or working … Be Using Cloud : Azure including AKS API Management and DevOps Pipelines and AWS including EKS Lambda and CloudFormation Infrastructure as Code and GitOps : Terraform Bicep Pulumi ArgoCD and FluxCD Observability : Prometheus Grafana OpenTelemetry and Datadog Security and Compliance : HashiCorp Vault Azure Key Vault AWS KMS OPA Gatekeeper and Drata or similar ? Interested in exploring this further This is a high More ❯
Birmingham, West Midlands, United Kingdom Hybrid / WFH Options
Halian Technology Limited
in the team Contribute to solution architecture and strategic technical direction Build, integrate, and maintain REST APIs and backend services Champion best practices in software quality, CI/CD, observability, and DevOps Collaborate with cross-functional teams including Product, QA, and DevOps Optionally take on people management responsibilities for engineers Stay updated with emerging backend and cloud technologies Key Skills More ❯
doing: Developing secure, high-performance APIs Working with databases like PostgreSQL, DynamoDB and MongoDB Deploying cloud-native services using tools like Terraform, Docker and Kubernetes Improving performance, security and observability in backend systems Collaborating with product managers, cloud engineers and designers What they’re looking for: Proven experience writing Python APIs (Django, Flask or FastAPI) Solid database design and cloud More ❯
doing: Developing secure, high-performance APIs Working with databases like PostgreSQL, DynamoDB and MongoDB Deploying cloud-native services using tools like Terraform, Docker and Kubernetes Improving performance, security and observability in backend systems Collaborating with product managers, cloud engineers and designers What they’re looking for: Proven experience writing Python APIs (Django, Flask or FastAPI) Solid database design and cloud More ❯
doing: Developing secure, high-performance APIs Working with databases like PostgreSQL, DynamoDB and MongoDB Deploying cloud-native services using tools like Terraform, Docker and Kubernetes Improving performance, security and observability in backend systems Collaborating with product managers, cloud engineers and designers What they’re looking for: Proven experience writing Python APIs (Django, Flask or FastAPI) Solid database design and cloud More ❯
a large-scale Azure cloud platform. Drive automation-first culture using Terraform, Azure CLI, PowerShell and more. Lead incident management, capacity planning, and performance tuning initiatives. Guide engineers in observability, cost optimisation, and security best practices. Define and track service level objectives (SLOs) to improve engineering outcomes. Champion a DevOps mindset with “you build it, you run it” accountability. We More ❯
of practices (e.g., Cloud, Platforms. AI, Strategy, Custom Application Development, Network & Edge, Security, Resiliency, etc.) Articulate the vision for modern engineering (e.g., agile, cloud-native, DevOps) and operations (e.g., observability, automated response, SRE, etc.) and able to articulate a path toward a target operating model (people, process, and tools) SoftServe is an Equal Opportunity Employer. All qualified applicants will receive More ❯
of practices (e.g., Cloud, Platforms. AI, Strategy, Custom Application Development, Network & Edge, Security, Resiliency, etc.) Articulate the vision for modern engineering (e.g., agile, cloud-native, DevOps) and operations (e.g., observability, automated response, SRE, etc.) and able to articulate a path toward a target operating model (people, process, and tools) SoftServe is an Equal Opportunity Employer. All qualified applicants will receive More ❯
of practices (e.g., Cloud, Platforms. AI, Strategy, Custom Application Development, Network & Edge, Security, Resiliency, etc.) Articulate the vision for modern engineering (e.g., agile, cloud-native, DevOps) and operations (e.g., observability, automated response, SRE, etc.) and able to articulate a path toward a target operating model (people, process, and tools) SoftServe is an Equal Opportunity Employer. All qualified applicants will receive More ❯
of practices (e.g., Cloud, Platforms. AI, Strategy, Custom Application Development, Network & Edge, Security, Resiliency, etc.) Articulate the vision for modern engineering (e.g., agile, cloud-native, DevOps) and operations (e.g., observability, automated response, SRE, etc.) and able to articulate a path toward a target operating model (people, process, and tools) SoftServe is an Equal Opportunity Employer. All qualified applicants will receive More ❯
of practices (e.g., Cloud, Platforms. AI, Strategy, Custom Application Development, Network & Edge, Security, Resiliency, etc.) Articulate the vision for modern engineering (e.g., agile, cloud-native, DevOps) and operations (e.g., observability, automated response, SRE, etc.) and able to articulate a path toward a target operating model (people, process, and tools) SoftServe is an Equal Opportunity Employer. All qualified applicants will receive More ❯
of practices (e.g., Cloud, Platforms. AI, Strategy, Custom Application Development, Network & Edge, Security, Resiliency, etc.) Articulate the vision for modern engineering (e.g., agile, cloud-native, DevOps) and operations (e.g., observability, automated response, SRE, etc.) and able to articulate a path toward a target operating model (people, process, and tools) SoftServe is an Equal Opportunity Employer. All qualified applicants will receive More ❯
of practices (e.g., Cloud, Platforms. AI, Strategy, Custom Application Development, Network & Edge, Security, Resiliency, etc.) Articulate the vision for modern engineering (e.g., agile, cloud-native, DevOps) and operations (e.g., observability, automated response, SRE, etc.) and able to articulate a path toward a target operating model (people, process, and tools) SoftServe is an Equal Opportunity Employer. All qualified applicants will receive More ❯
of practices (e.g., Cloud, Platforms. AI, Strategy, Custom Application Development, Network & Edge, Security, Resiliency, etc.) Articulate the vision for modern engineering (e.g., agile, cloud-native, DevOps) and operations (e.g., observability, automated response, SRE, etc.) and able to articulate a path toward a target operating model (people, process, and tools) SoftServe is an Equal Opportunity Employer. All qualified applicants will receive More ❯
of practices (e.g., Cloud, Platforms. AI, Strategy, Custom Application Development, Network & Edge, Security, Resiliency, etc.) Articulate the vision for modern engineering (e.g., agile, cloud-native, DevOps) and operations (e.g., observability, automated response, SRE, etc.) and able to articulate a path toward a target operating model (people, process, and tools) SoftServe is an Equal Opportunity Employer. All qualified applicants will receive More ❯
of practices (e.g., Cloud, Platforms. AI, Strategy, Custom Application Development, Network & Edge, Security, Resiliency, etc.) Articulate the vision for modern engineering (e.g., agile, cloud-native, DevOps) and operations (e.g., observability, automated response, SRE, etc.) and able to articulate a path toward a target operating model (people, process, and tools) SoftServe is an Equal Opportunity Employer. All qualified applicants will receive More ❯
Deep understanding of AWS services and modules. Strong Linux administration and troubleshooting skills. Experience with CI/CD pipelines and Infrastructure as Code (IaC). Experience with monitoring and observability tools like New Relic, DataDog, or Splunk. Skills and Strengths: Amazon Web Services (AWS) Auto Scaling, Fargate, Route53 Observability tools (New Relic, DataDog, Splunk) Scripting (Ansible, Bash, Python, Go) CI More ❯
services. Strong background in Linux administration and troubleshooting. Proven experience in implementing and managing CI/CD pipelines and Infrastructure as Code (IAC) solutions. Proven experience in monitoring and observability tools to proactively manage system health. Skills and Strengths: AWS (Amazon Web Services) Auto Scaling Fargate Route53 Observability tools (New Relic, DataDog, Splunk) Scripting (Ansible, Bash, Python, GO) CI/ More ❯
services. Strong background in Linux administration and troubleshooting. Proven experience in implementing and managing CI/CD pipelines and Infrastructure as Code (IAC) solutions. Proven experience in monitoring and observability tools to proactively manage system health. Skills and Strengths: AWS (Amazon Web Services) Auto Scaling Fargate Route53 Observability tools (New Relic, DataDog, Splunk) Scripting (Ansible, Bash, Python, GO) CI/ More ❯