platforms , including Google Cloud Platform (GCP) , AWS , and Azure Strong understanding of networking technologies , such as LAN, WAN, firewalls , and related infrastructure Proficient with observability and monitoring tools , e.g Grafana, SolarWinds, Prometheus, AWS CloudWatch, Splunk Familiarity with DevOps practices , including CI/CD pipelines , is beneficial If you would be interested in having a further chat then please send your More ❯
platforms , including Google Cloud Platform (GCP) , AWS , and Azure Strong understanding of networking technologies , such as LAN, WAN, firewalls , and related infrastructure Proficient with observability and monitoring tools , e.g Grafana, SolarWinds, Prometheus, AWS CloudWatch, Splunk Familiarity with DevOps practices , including CI/CD pipelines , is beneficial If you would be interested in having a further chat then please send your More ❯
platforms , including Google Cloud Platform (GCP) , AWS , and Azure Strong understanding of networking technologies , such as LAN, WAN, firewalls , and related infrastructure Proficient with observability and monitoring tools , e.g Grafana, SolarWinds, Prometheus, AWS CloudWatch, Splunk Familiarity with DevOps practices , including CI/CD pipelines , is beneficial If you would be interested in having a further chat then please send your More ❯
platforms , including Google Cloud Platform (GCP) , AWS , and Azure Strong understanding of networking technologies , such as LAN, WAN, firewalls , and related infrastructure Proficient with observability and monitoring tools , e.g Grafana, SolarWinds, Prometheus, AWS CloudWatch, Splunk Familiarity with DevOps practices , including CI/CD pipelines , is beneficial If you would be interested in having a further chat then please send your More ❯
including salt, git, Ansible, Terraform, Puppet and network element managers. • Network Performance Management (SNMP, NETMON, LiveAction) • Demonstrated experience in any two or more of the following: Netflow, Elastic, Kafka, Grafana, Prometheus, or Nexus Repo. • Demonstrate expertise in design and improvement of complex and geographically diverse enterprise networks. • Expert knowledge of US Army security requirements for network infrastructure. More ❯
any of the following: Ensuring uptime of critical systems through incident response and triage Automating systems administration using Bash, Python, or Ansible Monitoring and troubleshooting enterprise services with Prometheus, Grafana, or Splunk Configuring enterprise services using Ansible, YAML, or JSON Developing recovery procedures for large-scale systems, including backup and restore or blue/green deployments Proven track record of More ❯
support highly available telephony solutions using AudioCodes and Oracle SBCs Develop scripts, tools, and APIs to improve SIP routing, call flows, and automation Integrate telephony with monitoring platforms like Grafana and ThousandEyes Collaborate with carriers to support SIP infrastructure and hybrid voice networks Contribute to hybrid cloud telephony solutions across UCaaS and CCaaS platforms Participate in Agile sprints and support More ❯
multi-account AWS setups. Extensive experience with AWS Organisations Expert-level knowledge of AWS Networking, TLS, and security best practices. Experience with container orchestration (Kubernetes, EKS) and observability tools (Grafana, ELK). A passion for innovation, problem-solving, and delivering high-impact solutions. Working with Control Tower and Landing Zones Why Work For Us? Competitive base salary up to More ❯
Python, Go, or similar languages for automation and scripting. Expert-level knowledge of AWS Networking, TLS, and security best practices. Experience with container orchestration (Kubernetes, EKS) and observability tools (Grafana, ELK). A passion for innovation, problem-solving, and delivering high-impact solutions. Why Work For Us? 25 days holiday + bank holidays Up to 5% employer pension contribution Educational More ❯
Guildford, Surrey, United Kingdom Hybrid / WFH Options
Electronic Arts
e.g. Perforce, Git) Configuration management tools (e.g. Chef, Ansible, Terraform, Packer) Secrets management tools (e.g Vault) Virtualization environments and tools (e.g. VMs, vSphere) Data and Observability tools (e.g. Splunk, Grafana, New Relic, Open Telemetry) Growth-oriented mindset About Electronic Arts We're proud to have an extensive portfolio of games and experiences, locations around the world, and opportunities across EA. More ❯
Web Services or other cloud technologies Experience deploying SAN storage preferably from IBM (GPFS) 2 , Experience bootstrapping HPE servers, configuring storage, iLO Experience deploying enterprise monitoring tools such as Grafana Experience with VMware VSAN, vCenter, replication, Veeam backup integration Experience with relational database technologies such as Oracle and MySQL Advanced writing skills: able to clearly articulate ideas for executive level More ❯
Fine Tuning: Drive the deployment and fine tuning of large language models (LLMs) while ensuring efficient training pipelines and model hosting. Monitoring & Performance Optimization: Implement monitoring (using Prometheus/Grafana and similar tools) and logging solutions to ensure system reliability and to optimise model throughput. Collaborate Across Teams: Work closely with Machine Learning engineers to enable their delivery What We More ❯
with performance and load testing frameworks (e.g., k6, JMeter) Familiarity with cloud-based test environments and infrastructure (AWS preferred) Working knowledge of observability and test reporting tools (e.g., Datadog, Grafana) Experience improving test data strategies and test isolation techniques Contributions to internal tooling or open-source testing frameworks Background in building out quality initiatives at the org level EverQuote Can More ❯
Leeds, Yorkshire, United Kingdom Hybrid / WFH Options
William Hill PLC
our ethos. To apply to this post, you will have: A base in Leeds with working experience of an incident response model and fluency with observability and monitoring (Prometheus, Grafana) Experience defining alerts and implementing dashboards from existing monitoring and logging data Relentless focus on customer experience with good understanding of security best practice Fluency in cloud infrastructure (AWS) - using More ❯
Airflow, or on common problems such as model and API monitoring, data drift and validation, autoscaling, access permissions Have previously worked with monitoring tools such as New Relic or Grafana Understand the use of feature stores and related data technologies for operational machine learning products Are proficient with Python and have Spark knowledge. Have leadership experience either through previous management More ❯
collaborate effectively with cross-functional teams, including DevOps, Engineering, Service Reliability, and Service Delivery teams. Technical Expertise: In-depth knowledge of open-source and commercial observability tools (e.g., Prometheus, Grafana, NewRelic). Expertise in cloud environments (e.g., AWS, Azure) and infrastructure as code (IaC) tools like Terraform. Monitoring and Observability: Experience in creating and maintaining dashboards for proactive monitoring of More ❯
to managing our infrastructure, using Terraform. - We follow a GitOps approach to managing our Kubernetes configuration, using ArgoCD and Helm. - We manage a high-availability metrics collection system using Grafana, Thanos & Prometheus. We're in the process of transitioning to OpenTelemetry and Honeycomb for our application telemetry (traces and metrics). - We manage a data pipeline using Pub/Sub More ❯
provided by GCP/AWS, such as S3, FSX, EKS, SQS, SNS, Kinesis, AmazonMQ, DynamoDB, GKE, CloudStorage, PubSub, Filestore, Knowledge of modern observability technologies such as ELK, Splunk, Prometheus, Grafana, Micrometer "What-if" thinking, while designing or reviewing solutions, to foresee or catch potential problems as early in the development process, as only possible Nice to have: Good knowledge of More ❯
primary language for our backend codebase AWS & GCP - we're cloud-native Kubernetes (EKS) Microservice based architecture RESTful APIs PostgreSQL, JDBI, Flyway TeamCity for CI/CD Terraform and Grafana The Team: The Core Banking group is seeking passionate engineers ready to tackle complex challenges and contribute to foundational systems, powering modern banking, that process millions of transactions daily, ensuring More ❯
and distributed storage. Proficiency in Python, Bash, and experience with automation scripting for system monitoring and troubleshooting. Knowledge of POSIX, NFS, S3 protocols, log management, and monitoring tools (Prometheus, Grafana). It's nice if you have: Experience with JIRA, Confluence, Slack, and other collaboration tools. Experience collaborating between customer support and product development teams. Familiarity with Kubernetes, Containers, LXC More ❯
The Opportunity Just Eat Takeaway is seeking an aspiring Engineer to join the Platform Observability team. The team sits within the Platform & Reliability department, which exists to provide global engineering a magnifying glass into their services while driving commercial availability More ❯
Department: Tech Services Location: SEGA West London Reporting To: Head of Corporate Infrastructure Position Overview: We are seeking an experienced Senior Build + Release Engineer with games industry experience to design, deploy, and maintain our CI/CD and build More ❯
to offer 4 day working weeks and part time options can also be considered. Requirements: - Active DV Clearance - Kubernetes - Terraform - Strong knowledge of monitoring tools such as Prometheus or Grafana - Python or other scripting language If you're a DevOps engineer looking for acontract offering £500 - £ 550 A DAY OUTSIDE IR35 , then send an updated CV to The client is More ❯
/Polygraph Clearance Required Qualifications Experience building distributed systems. Experience performing application, network, and infrastructure monitoring and analysis. Familiarity with open source tools such as Istio, Keycloak, Nginx, Prometheus, Grafana, Accumulo, and Elasticsearch. Experience with administering Kubernetes clusters including deploying and configuring operators and helm charts. Experience with one or more of the following programming languages: Go, Java, Javascript, Kotlin More ❯
IaC principles and automation tools such as Ansible and SaltStack Experience with Elastic Stack (Elasticsearch/Kibana/Logstash/Beats) Experience with time-series visualization tools such as GrafanaMore ❯