in Agile teams using tools like Git , Jira , and Confluence Eligible for SC and NPPV3 clearance Container orchestration with Kubernetes HashiCorp tools: Vault , Consul , Packer Monitoring and observability with Grafana , Prometheus , or similar Familiarity with cloud networking, VPCs, NAT Gateways, security groups, etc. Personal Attributes: Proactive and self-driven with a passion for technology Strong problem-solving mindset Collaborative team More ❯
London, England, United Kingdom Hybrid / WFH Options
Magentus Group
similar). Experience with scripting or programming languages (Python, Go, Bash, etc.). Understanding of networking, security principles, and best practices. Knowledge of observability tools such as Datadog, Prometheus, Grafana, etc. Desired Attributes Strong problem-solving skills with a proactive approach to improving systems and processes. Excellent communication and collaboration skills, able to work effectively with cross-functional teams. Ability More ❯
Bristol, Gloucestershire, United Kingdom Hybrid / WFH Options
TwinStream
auto-scaling, performance tuning, troubleshooting and disaster recovery best practices Working knowledge of network security protocols Working knowledge of AWS Experience with monitoring tools such as InfluxDB, Prometheus or Grafana Experience of working in a managed service environment Experience using, developing with and maintaining cloud hosting services (ideally AWS EC2, RDS, S3, Lambda) Experience of event-driven integration with MQ More ❯
Manchester, England, United Kingdom Hybrid / WFH Options
Magentus Group
similar). Experience with scripting or programming languages (Python, Go, Bash, etc.). Understanding of networking, security principles, and best practices. Knowledge of observability tools such as Datadog, Prometheus, Grafana, etc. Desired Attributes Strong problem-solving skills with a proactive approach to improving systems and processes. Excellent communication and collaboration skills, able to work effectively with cross-functional teams. Ability More ❯
systems that require the highest data throughput in Java and C++. We use Airflow for workflow management, Kafka for data pipelines, Bitbucket for source control, Jenkins for continuous integration, Grafana + Prometheus for metrics collection, ELK for log shipping and monitoring, Docker and Kubernetes for containerisation, OpenStack for our private cloud, Ansible and Terraform for architecture automation, and Slack for More ❯
effective troubleshooting activities Awareness of any cloud infrastructure principles (like AWS, GCP or OCI), understanding basic principles of secure software delivery is a plus Familiar with Observability tools like Grafana or Prometheus, understanding the importance of giving the correct visibility to our platforms and environments We highly value ownership and initiative with capabilities to drive projects independently with an organized More ❯
Routing and Switching experience including routing protocols (BGP, OSPF and MPLS) Experience with Network monitoring tools/protocols and logging pipelines like SNMP, Syslog, Netflow, Elasticsearch (ELK Stack) and Grafana Strong understanding of network security principles including ACLs, Firewalls, VPN, 802.1x authentication, profiling and RBAC Proficiency in a modern programming language (Python, Golang, Ruby, etc.) Strong knowledge of and experience More ❯
team. As with most service teams, there will eventually be a periodic on call rotation as part of this role. Our developer kitchen includes: Java, REST, Docker, Kubernetes, μservice, Grafana and much more. The Position Principal Software Engineer (IC4) As a Principal Software Engineer, you will already be a world-class engineer with top-notch coding skills and confidence working More ❯
engineering and maintaining a private-cloud infrastructure: Bare-metal, vSphere, KVM, Kubernetes. Experience with tools like Ansible, Terraform, Docker, Kafka, Nexus. Experience with observability platforms: InfluxDB, Prometheus, ELK, Jaeger, Grafana, Nagios, Zabbix. Familiarity with Big Data tools: Hadoop, HDFS, Spark, HBase. Ability to write code in Go, Python, Bash, or Perl for automation. Work Experience 6-8 years of proven More ❯
London, England, United Kingdom Hybrid / WFH Options
Registers Of Scotland
Docker, Infrastructure as Code (CDK, CloudFormation, Ansible) Cloud: AWS (Lambda, API Gateway, S3, Aurora, IAM) Container Platforms: OpenShift, Kubernetes QA: JUnit, Mockito, Cypress, Jest, React Testing Library, SonarQube Monitoring: Grafana, Kibana, CloudWatch, X-Ray Architecture: Microservices, serverless, event-driven, DDD The Role As a Technical Lead, you will lead and support one or more development teams, guiding the delivery of More ❯
engineering and maintaining a private-cloud infrastructure: Bare-metal, vSphere, KVM, Kubernetes. Experience with tools like Ansible, Terraform, Docker, Kafka, Nexus Experience with observability platforms: InfluxDB, Prometheus, ELK, Jaeger, Grafana, Nagios, Zabbix Familiarity with Big Data tools: Hadoop, HDFS, Spark, HBase Ability to write code in Go, Python, Bash, or Perl for automation. Work Experience 5-7+ years of More ❯
Bristol, England, United Kingdom Hybrid / WFH Options
TwinStream
In 2019, our founders were working as engineers solving complex cross domain problems within government organisations TwinStream was formed to consolidate their collective expertise and experience into one business, providing technical excellence and exceptional service to their clients. We have More ❯
Bristol, Gloucestershire, United Kingdom Hybrid / WFH Options
TwinStream
In 2019, our founders were working as engineers solving complex cross domain problems within government organisations TwinStream was formed to consolidate their collective expertise and experience into one business, providing technical excellence and exceptional service to their clients. We have More ❯
production incident response. Key Responsibilities - Manage and monitor AWS infrastructure for performance and security - Respond to production incidents, perform root cause analysis, and implement fixes - Maintain observability tools (Prometheus, Grafana, Splunk) and write PromQL queries - Improve and operate CI/CD pipelines using GitHub Actions and Kubernetes - Automate infrastructure tasks with Python, Bash, Go or SQL - Work with Git-based … on-call rotation to ensure system reliability Your Profile Essential: - Solid hands-on AWS experience in a DevOps setting - Background in incident, change, and problem management - Strong with Prometheus, Grafana, Splunk, and PromQL - Proficient in scripting (Python, Go, Bash, SQL) - Skilled in GitHub, CI/CD, and Kubernetes operations Desirable: - Experience with Terraform or CloudFormation - Advanced log analysis with Splunk More ❯
availability, optimal performance, and reliability across production and non-production environments. This includes working on incident response, capacity planning, WAN optimization, and system observability using tools like Prometheus and Grafana . Key Responsibilities: Administer and maintain Solace PubSub+ appliances and software brokers across environments (on-prem and cloud). Provide production support for messaging-related incidents, including root cause analysis … and resolution. Monitor system performance and health using Prometheus and Grafana ; proactively identify and address anomalies. Configure and optimize Solace across WAN environments , ensuring low-latency, secure, and reliable messaging. Collaborate with development and application support teams to troubleshoot message flow issues and integration problems. Perform capacity planning , scaling, and tuning of Solace infrastructure to meet current and future demand. … production support , preferably in a 24x7 enterprise environment. Experience working with distributed systems over WAN , with an understanding of networking, latency, and failover strategies. Solid experience with Prometheus and Grafana for system monitoring and alerting. Proficiency in troubleshooting message delivery, persistence, and topic routing. Experience with capacity management , performance tuning, and system scaling. Familiarity with Linux/Unix systems and More ❯
availability, optimal performance, and reliability across production and non-production environments. This includes working on incident response, capacity planning, WAN optimization, and system observability using tools like Prometheus and Grafana . Key Responsibilities: Administer and maintain Solace PubSub+ appliances and software brokers across environments (on-prem and cloud). Provide production support for messaging-related incidents, including root cause analysis … and resolution. Monitor system performance and health using Prometheus and Grafana ; proactively identify and address anomalies. Configure and optimize Solace across WAN environments , ensuring low-latency, secure, and reliable messaging. Collaborate with development and application support teams to troubleshoot message flow issues and integration problems. Perform capacity planning , scaling, and tuning of Solace infrastructure to meet current and future demand. … production support , preferably in a 24x7 enterprise environment. Experience working with distributed systems over WAN , with an understanding of networking, latency, and failover strategies. Solid experience with Prometheus and Grafana for system monitoring and alerting. Proficiency in troubleshooting message delivery, persistence, and topic routing. Experience with capacity management , performance tuning, and system scaling. Familiarity with Linux/Unix systems and More ❯
with load balancers (F5, HAProxy, Nginx) and network monitoring tools. Experience in DNS management and troubleshooting. Experience in network security best practices. Proficiency in monitoring and observability tools (Prometheus, Grafana, Splunk). Proficiency in at least one scripting language (Python, Bash) for automation. Experience with CI/CD pipeline management and DevOps practices. Strong understanding of disaster recovery and business … monitoring storage and system performance. Experience in tools like df, du, lsblk, and fdisk for managing and troubleshooting file systems and disk partitions. Familiarity with tools like Prometheus and Grafana for monitoring and observability More ❯
London, England, United Kingdom Hybrid / WFH Options
Stott and May
production incident response. Key Responsibilities Manage and monitor AWS infrastructure for performance and security Respond to production incidents, perform root cause analysis, and implement fixes Maintain observability tools (Prometheus, Grafana, Splunk) and write PromQL queries Improve and operate CI/CD pipelines using GitHub Actions and Kubernetes Automate infrastructure tasks with Python, Bash, Go or SQL Work with Git-based … on-call rotation to ensure system reliability Your Profile Essential Solid hands-on AWS experience in a DevOps setting Background in incident, change, and problem management Strong with Prometheus, Grafana, Splunk, and PromQL Proficient in scripting (Python, Go, Bash, SQL) Skilled in GitHub, CI/CD, and Kubernetes operations Desirable Experience with Terraform or CloudFormation Advanced log analysis with Splunk More ❯
Farnborough, England, United Kingdom Hybrid / WFH Options
Searchability NS&D
configuration and orchestration best practices Develop scalable, secure infrastructure using Terraform and Ansible Evangelise GitOps and support deployment automation Monitor and improve platform performance using tools like Prometheus and Grafana Provide technical oversight and guidance to cross-functional teams Stay ahead of emerging tech trends to enhance platform capabilities WHAT I'M LOOKING FOR 5+ years' experience in Platform, DevOps … Terraform , Ansible , and CI/CD tooling (e.g., Jenkins, GitLab CI/CD) Solid understanding of Git and version control best practices Experience with monitoring tools like Prometheus and Grafana Comfortable in fast-paced, agile environments Excellent communication and problem-solving skills Active SC or DV clearance required NICE TO HAVE Experience with cloud platforms (AWS, Azure, GCP) GitOps mindset More ❯
with load balancers (F5, HAProxy, Nginx) and network monitoring tools. Experience in DNS management and troubleshooting. Experience in network security best practices. Proficiency in monitoring and observability tools (Prometheus, Grafana, Splunk). Proficiency in at least one scripting language (Python, Bash) for automation. Experience with CI/CD pipeline management and DevOps practices. Strong understanding of disaster recovery and business … monitoring storage and system performance. Experience in tools like df, du, lsblk, and fdisk for managing and troubleshooting file systems and disk partitions. Familiarity with tools like Prometheus and Grafana for monitoring and observability Seniority level Seniority level Not Applicable Employment type Employment type Full-time Job function Job function Information Technology Industries Computer and Network Security Referrals increase your More ❯
of users Design and implement Infrastructure as Code solutions that set industry standards Build resilient CI/CD pipelines using Bitbucket and Spacelift orchestration Develop sophisticated observability strategies with Grafana , CloudWatch , and advanced monitoring tools Leadership & Growth Opportunities Mentor emerging DevOps talent and shape team culture Influence architectural decisions across cross-functional teams Drive strategic initiatives that align technical excellence … heavy DevOps) Cloud Platforms : Recent AWS experience with enterprise-scale deployments CI/CD Mastery : Advanced experience with Jenkins, Bitbucket Pipelines, and orchestration tools Observability : Hands-on expertise with Grafana, Splunk, CloudWatch for proactive monitoring Leadership & Delivery: Proven track record architecting scalable, secure infrastructure solutions Experience implementing advanced security measures across DevOps workflows Large-scale project management and delivery experience More ❯
performance based on set targets will be expected. VARIED DAY TO DAY RESPONSIBILITIES Ensuring system reliability, performance, and scalability through monitoring and automation Building and maintaining observability solutions using Grafana, Prometheus, Loki, OpenTelemetry Proactively identifying and resolving performance bottlenecks and infrastructure issues Automating infrastructure provisioning, configuration management, and deployments Implementing effective logging, monitoring, and alerting strategies Managing incident response and … as incident management, error budgets, and service-level objectives (SLOs) Experience designing and implementing robust observability, monitoring and logging solutions Strong proficiency with observability and monitoring tools such as Grafana, Prometheus, and Loki Strong experience with distributed tracing and telemetry tools such as OpenTelemetry An understanding of cloud networking architecture and load balancing techniques Experience with container orchestration platforms like More ❯
with load balancers (F5, HAProxy, Nginx) and network monitoring tools. Experience in DNS management and troubleshooting. Experience in network security best practices. Proficiency in monitoring and observability tools (Prometheus, Grafana, Splunk). Proficiency in at least one scripting language (Python, Bash) for automation. Experience with CI/CD pipeline management and DevOps practices. Strong understanding of disaster recovery and business … monitoring storage and system performance. Experience in tools like df, du, lsblk, and fdisk for managing and troubleshooting file systems and disk partitions. Familiarity with tools like Prometheus and Grafana for monitoring and observability. We back our colleagues and their loved ones with benefits and programs that support their holistic well-being. That means we prioritize their physical, financial, and More ❯
Annapolis Junction, Maryland, United States Hybrid / WFH Options
Codescratch LLC
and SonarQube Knowledgeable in Artificial Intelligence, specifically Large Language Models A strong understanding of cybersecurity best practices, encryption methods, and secure coding techniques Familiar with observability tools, including Prometheus, Grafana, and the ELK stack Ability to effectively communicate intricate technical information to individuals with non-technical backgrounds and to senior leadership Experience with Machine Learning Analytics Experience with Amazon Web … Services (AWS) Experience with asynchronous messaging systems (RabbitMQ, Apache Kafka, etc.) Experience monitoring application performance with metrics (Prometheus, InfluxDB, Grafana) and logs with ELK Stack (ElsticSearch, Logstash, Kibana) Excellent communication and collaboration abilities Experience working independently to solve complex problems Salary Range Pay range $165,000 - $225,000. (Plus Benefits) The pay range for this job level is a general More ❯