of UNIX, Linux, networking (TCP/IP), and databases (both relational and NoSQL). Experience in android and iOS application debugging. Experience with observability tools such as Grafana and Prometheus, and skills in documenting procedures for knowledge management. Strong interpersonal and communication skills to thrive in fast-paced, dynamic environments. NOTE: As part of the operation staff members of the More ❯
Bradford, Yorkshire, United Kingdom Hybrid / WFH Options
Yorkshire Building Society Group
in the following: Continuous Integration/Continuous Delivery pipelines - tools such as Jenkins & GitLab Scripting and automation capabilities Modern monitoring skills and best practices using tools such as Grafana, Prometheus, Kibana, DynaTrace Testing frameworks Knowledge of networks and routing. Knowledge of integrations of services utilising different technologies such as PLSQL, .Net, C#, Java, Sprint Boot, Spring Batch Experience of integrating More ❯
native applications Working in a Continuous Delivery environment Modern observability practices Nice to have Not vital, but you'll have the edge if you also have experience with: Grafana Prometheus Kotlin or a least the willingness to learn it Batch processing data pipelines or have worked in: an eCommerce organisation a shipping/logistics/exports organisation What you bring More ❯
availability and security. Automation & CI/CD: Implement and manage CI/CD pipelines for efficient deployment, testing, and monitoring of applications. Observability & Monitoring: Develop comprehensive monitoring solutions using Prometheus, Grafana, ELK stack, or similar tools to improve system reliability. Security & Compliance: Apply best practices for cloud security, IAM policies, and compliance frameworks (SOC2, ISO 27001, etc.). Incident Response … clusters). Proficiency in scripting and automation using Python, Bash, or Go. Experience with Infrastructure as Code (Terraform, CloudFormation, or Ansible). Familiarity with monitoring, logging, and observability tools (Prometheus, Grafana, Datadog, ELK, etc.). Strong understanding of networking concepts (VPC, Load Balancers, DNS, Firewalls). Experience with DevOps methodologies, CI/CD pipelines, and GitOps practices. Experience with high More ❯
in the development lifecycle. Observability & Reliability (SRE) Lead the charge on improving our observability strategy. Design and implement a robust monitoring, logging, and alerting framework using tools like Grafana, Prometheus, and native AWS services. Enhance our incident response processes, contribute to on-call rotations, and foster a culture of blameless post-mortems. Security & Governance Drive infrastructure security best practices across … ability to mentor and collaborate with other engineers. Technical Skills: Cloud: AWS (EKS, RDS, Lambda, etc.) IaC: Terraform (Expert) Containerisation: Kubernetes, Docker CI/CD: GitHub Actions Observability: Grafana, Prometheus, AWS CloudWatch, OpenTelemetry/distributed tracing. Scripting: Strong proficiency in at least one scripting language (e.g., Python, Go, Bash). Familiarity with JavaScript/TypeScript is a plus, as it More ❯
React on the Frontend. Tech & Data Science stack: Kubernetes & Docker on Google Cloud Python 3: Pandas, RabbitMQ, Celery, Flask, SciPy, NumPy, Dash, Plotly, Matplotlib Javascript, React, Redux PostgreSQL, Redis Prometheus, Alert Manager, DataDog If you joined the company in a Data Science role you would be working on sophisticated pricing algorithms which would enable companies in the entertainment industry to More ❯
Site Reliability Engineering function they're building from scratch. They talked about production infrastructure, optimisation, automation and focusing on the deployment process rather than the build. We discussed Kubernetes, Prometheus and API Gateways. Most importantly, they spoke like they knew what the hell they were on about. Not just about SRE, but on the whole Engineering process. This is a More ❯
Expertise required for this engagement: Guide operational practices across services built using Java (Spring Boot) , Kafka , MongoDB and related technologies. Oversee monitoring, observability, and performance tuning using Datadog , ELK , Prometheus , or similar tooling. Problem Management & Root Cause Elimination required: Lead proactive and reactive problem management efforts. Identify recurring production issues and collaborate with engineering to design permanent solutions. Track and More ❯
are ever the same. Essential Skills Solid Unix/Linux skills Experience with Bash, SQL, PHP Comfortable with Apache/Nginx, load balancers (HAProxy), and monitoring tools (Nagios, Grafana, Prometheus) Knowledge of log management (Graylog, Elasticsearch) Familiar with Ansible and Gitlab CI/CD Experience using Git/SVN What Sets You Apart Passionate self-starter who loves problem-solving More ❯
Manchester, Lancashire, England, United Kingdom Hybrid / WFH Options
DCS Recruitment
are ever the same. Essential Skills Solid Unix/Linux skills Experience with Bash, SQL, PHP Comfortable with Apache/Nginx, load balancers (HAProxy), and monitoring tools (Nagios, Grafana, Prometheus) Knowledge of log management (Graylog, Elasticsearch) Familiar with Ansible and Gitlab CI/CD Experience using Git/SVN What Sets You Apart Passionate self-starter who loves problem-solving More ❯
Newcastle Upon Tyne, Tyne and Wear, England, United Kingdom
Nigel Wright Group
stakeholders, end0users and technologists ITIL (or similar) certification (or experience working within an ITIL framework) Strong understanding of application design, rational databases (SQL Server), monitoring and alerting tools (Grafana, Prometheus, Victoria Metrics), scheduling tools (Control-M), operating systems (Windows/Linux), Kubernetes, cloud platforms (Azure), issue tracking and source control (JIRA, Git, Bitbucket). Interview Process: Coding Challenge – We would More ❯
Computer Science, Engineering, or related field. Strong programming skills in Go (ideally) Rust or C++. Solid experience in building and supporting complex backend systems at scale. Experience with Elasticsearch, Prometheus, Grafana and/or Datadog. Exposure either AWS or GCP plus IaC, (Terraform or similar) would be beneficial. Knowledge with open-source storage tools (Ceph, Minio, JuiceFS or Fuse) and More ❯
reporting. Develop and implement TOC strategy, staffing models, and documentation standards. Participate in systems architecture, new tech evaluation, and vendor selection. Manage operational workflows, reporting systems (e.g., Zabbix, Grafana, Prometheus), and support international broadcast teams. Collaborate with leadership on technical direction and TOC transformation. Ideal Candidate: Previous technical leadership role within a TOC, NOC, or MCR environment. Strong understanding of More ❯
reporting. Develop and implement TOC strategy, staffing models, and documentation standards. Participate in systems architecture, new tech evaluation, and vendor selection. Manage operational workflows, reporting systems (e.g., Zabbix, Grafana, Prometheus), and support international broadcast teams. Collaborate with leadership on technical direction and TOC transformation. Ideal Candidate: Previous technical leadership role within a TOC, NOC, or MCR environment. Strong understanding of More ❯
skills Proven experience as a DevOps Engineer, Site Reliability Engineer, Platform Engineer or similar role. Ideally in an entreprise-grade Experience with APM stacks such as Datadog, New Relic, Prometheus or similar. Experience with handling telemetry, tracing and logging data, at scale, in multiple different environments. Familiarity with low-level telemetry daemons and aggregators such as StatsD. Proficient with Python More ❯
reporting. Develop and implement TOC strategy, staffing models, and documentation standards. Participate in systems architecture, new tech evaluation, and vendor selection. Manage operational workflows, reporting systems (e.g., Zabbix, Grafana, Prometheus), and support international broadcast teams. Collaborate with leadership on technical direction and TOC transformation. Skills/Must Have: 5-7+ years in a technical leadership role within a TOC More ❯
Real Time data, designing systems that can elastically scale to handle surges in throughput and demand. Hands-on experience with modern technologies such as Kubernetes, Kafka, RocksDB, MongoDB, MemSQL, Prometheus, Tempo, and Snowflake is highly desirable. Exposure to cloud-native tooling and practices, with an emphasis on DevOps, cloud computing, Kubernetes, and stream processing is a strong advantage. Comfortable working More ❯
evaluate and implement new technologies, and oversee their integration. Collaborate with external vendors and partners to ensure high-quality service delivery. Utilise and develop monitoring systems (e.g., Zabbix, Grafana, Prometheus) and oversee client reporting systems. Skills and Qualifications 5-7+ years' experience in a technical leadership role within a 24/7 broadcast, network operations centre (NOC), or Master More ❯
concurrent users (e.g., multi-tenant PostgreSQL, sharded MySQL). Strong backend fundamentals around concurrency, caching, indexing and distributed systems trade-offs. Proven track record of setting SLOs, building dashboards (Prometheus/Grafana, OpenTelemetry, etc.) and tuning alerts. Comfort with Kubernetes , IaC and cloud-native patterns; can debug from network to application layer. Start-up bias for action: you prioritise high More ❯
such as IBM Netcool, Moogsoft, BigPanda, PagerDuty, ServiceNow AIOps. Proficiency in Python, and hands-on knowledge of Ansible Automation Platform. Other highly valued skills include: Knowledge of Observability Platforms: Prometheus, Grafana, ELK, Splunk. Experience with integration into ITSM platforms such as ServiceNow. Experience with Kafka. You may be assessed on the key critical skills relevant for success in role, such More ❯
through previous experience within the financial services sector. Desirable Skills Experience with .Net ecosystem Scripting skills - Unix, RegEx, Powershell Prior experience of working with: Nagios, Splunk, ELK stack, Grafana, Prometheus BitBucket, Git, Octopus MSMQ, Kafka, IBM MQ Automate Enterprise (Help Systems) Salerio (COR Financials) SWIFT Message Types Personal Attributes: Strong analytical and problem-solving skills with ability to assess risk More ❯
through previous experience within the financial services sector. Desirable Skills Experience with .Net ecosystem Scripting skills - Unix, RegEx, Powershell Prior experience of working with: Nagios, Splunk, ELK stack, Grafana, Prometheus BitBucket, Git, Octopus MSMQ, Kafka, IBM MQ Automate Enterprise (Help Systems) Salerio (COR Financials) SWIFT Message Types Personal Attributes: Strong analytical and problem-solving skills with ability to assess risk More ❯
and instruments, with broad asset class understanding, through previous experience within the financial services sector. Experience with .Net ecosystem Prior experience of working with: Nagios, Splunk, ELK stack, Grafana, Prometheus BitBucket, Git, Octopus MSMQ, Kafka, IBM MQ Automate Enterprise (Help Systems) Salerio (COR Financials) SWIFT Message Types Personal Attributes: Strong analytical and problem-solving skills with ability to assess risk More ❯
GitHub Actions) Define and enforce platform standards across environments (dev, staging, prod) Collaborate with developers and DevOps on deployment tooling and security Enable platform observability using tools like Datadog, Prometheus, and CloudWatch Maintain Helm charts and Terraform modules for shared infrastructure Contribute to onboarding documentation and platform adoption practices Participate in incident response and postmortem analysis, where applicable Essential Skills … and secure image management Scripting or programming experience in Bash, Python, or TypeScript Strong understanding of GitOps practices and infrastructure lifecycle management Desirable Skills Experience with observability tooling (Datadog, Prometheus, Fluent Bit) Knowledge of admission controllers, OPA/Gatekeeper (optional for governance) Familiarity with cloud cost optimisation and Kubernetes scaling strategies Exposure to security scanning tools (tfsec, Trivy, Snyk) Interest More ❯