Unix based systems. Experience withcloud-based platforms (e.g. AWS). Experience with real-time data messaging (e.g. Redis, websockets). Experience with deployment and monitoring tools, e.g. supervisor, dockers, Grafana, Nagios, etc. Excellent problem-solving skills and attention to detail. Preferred/Desirable Experience Experience in the finance or cryptocurrency sectors. Experience with exchange connectors implementation. Familiarity with low latency More ❯
based systems. Experience with cloud-based platforms ( e.g. AWS). Experience with real-time data messaging ( e.g. Redis, websockets ). Experience with deployment and monitoring tools, e.g. supervisor, dockers, Grafana, Nagios, etc. Excellent problem-solving skills and attention to detail. Preferred/Desirable Experience Experience in the finance or cryptocurrency sectors. Experience with exchange connectors implementation. Familiarity with low latency More ❯
from application to network to host PREFERRED QUALIFICATIONS Exposure to cloud computing concepts and design considerations Experience in a production environment Experience of monitoring frameworks (such as CloudWatch, Datadog, Grafana, Elastic or similar) Amazon is an equal opportunities employer. We believe passionately that employing a diverse workforce is central to our success. We make recruiting decisions based on your experience More ❯
security best practices across cloud and network environments. Troubleshoot deployment and performance issues across multiple environments. Set up and maintain observability tools for logging, monitoring, and alerting (e.g., Prometheus, Grafana, Loki). Contribute to internal tooling to streamline development, testing, and operations workflows. Stay current with DevOps trends and recommend improvements to tools and processes. Required Qualifications: Bachelor's degree … Exposure to multi-cloud or hybrid cloud architectures. Tech Stack: Cloud: AWS, OCI ZTN: Cloudflare Application: Kong (API Gateway), Java Spring Boot, Python, Go, TypeScript Monitoring: Prometheus Stack (Prometheus, Grafana, Loki) Compute: ECS, EC2, Lambda Frontend: S3, CloudFront Data: Glue, S3, PostgreSQL CI/CD: GitHub Actions IaC: Terraform, AWS SAM Why Join Us? At Intelmatix, you'll work on More ❯
and maintain Infrastructure as Code (IaC) using Terraform and Ansible. Design highly reliable, scalable, and secure infrastructure supporting performance-critical workloads. Build proactive monitoring, observability, and alerting with Prometheus, Grafana, Azure Monitor, DataDog, and Dynatrace. Troubleshoot complex system issues spanning applications, networks, and infrastructure. Define platform SLAs, SLOs, and governance standards for self-service use. Collaborate closely with Salesforce DevOps … Ansible, along with scripting in PowerShell, Python, or Bash Experience implementing GitOps workflows and managing platform SLAs, SLOs, and governance standards Familiarity with observability and monitoring tools including Prometheus, Grafana, Azure Monitor, DataDog, or Dynatrace Preferred experience supporting Salesforce DevOps pipelines and working with Java, .NET, or Node.js application environments Exposure to AI/ML platforms, real-time data pipelines More ❯
London, South East, England, United Kingdom Hybrid / WFH Options
Eligo Recruitment
indexing, and capacity planning for mission-critical systems Develop secure backup, recovery, and disaster recovery procedures Explore multi-tenant and sharded architectures to support growth Implement monitoring strategies using Grafana, Datadog, and CI/CD integrations Champion database best practices, mentor teams, and standardize tooling and automation What You’ll Bring Extensive experience managing cloud-hosted PostgreSQL at scale Proficiency More ❯
and continuously suggest how the backend can provide the best Customer Experience A passion for crypto and the transformations it enables We use Kotlin, PostgreSQL, Kafka, Redis, Datadog, Amplitude, Grafana, BigQuery, ApacheSpark and more COMPENSATION & PERKS Unlimited vacation policy; work hard and take time when you need it Unlimited learning policy; order the technical resources you need or simply pick More ❯
consistent high levels of test coverage, strong technical documentation and effective monitoring Preferably exposure to technologies such as Kafka, PostgreSQL, Redis We use Kotlin, PostgreSQL, Kafka, Redis, Datadog, Amplitude, Grafana, BigQuery, ApacheSpark and more A passion for crypto and the transformations it enables COMPENSATION & PERKS Full-time salary based on experience and meaningful equity in an industry-leading company Hybrid More ❯
/reporting. Develop and implement TOC strategy, staffing models, and documentation standards. Participate in systems architecture, new tech evaluation, and vendor selection. Manage operational workflows, reporting systems (e.g., Zabbix, Grafana, Prometheus), and support international broadcast teams. Collaborate with leadership on technical direction and TOC transformation. Ideal Candidate: Previous technical leadership role within a TOC, NOC, or MCR environment. Strong understanding More ❯
London, Lime Street, United Kingdom Hybrid / WFH Options
Hays Technology
working with offshore engineering partners Proficient with tools such as JIRA, Confluence, Figma and an API documentation platform Nice to have Exposure to observability and support workflows, for example Grafana or similar Experience in UX or service design research and translating insights into requirements Familiarity with cloud platforms and CI or CD concepts Background in healthtech, B2B SaaS or compliance More ❯
in AWS (EKS, EC2, RDS/Aurora, S3). Develop and maintain Infrastructure as Code using Terraform and configuration management with Ansible. Enhance monitoring, logging, and alerting using the Grafana stack (Prometheus, Loki, Tempo). Participate in incident management, root cause analysis, andpost-incident reviews. Implement automation to reduce manual operational tasks and improve recovery time. Contribute to the definition … AWS services relevant to production workloads (EKS, EC2, RDS/Aurora, S3, IAM). Infrastructure as Code with Terraform and configuration management with Ansible. Strong experience with observability tools (Grafana, Prometheus, Loki, Tempo). Understanding of SRE concepts (SLIs, SLOs, error budgets, capacity planning). Comfortable working in incident and problem management processes. Strong GitOps mindset for managing platform and More ❯
in AWS (EKS, EC2, RDS/Aurora, S3). Develop and maintain Infrastructure as Code using Terraform and configuration management with Ansible. Enhance monitoring, logging, and alerting using the Grafana stack (Prometheus, Loki, Tempo). Participate in incident management, root cause analysis, andpost-incident reviews. Implement automation to reduce manual operational tasks and improve recovery time. Contribute to the definition … AWS services relevant to production workloads (EKS, EC2, RDS/Aurora, S3, IAM). Infrastructure as Code with Terraform and configuration management with Ansible. Strong experience with observability tools (Grafana, Prometheus, Loki, Tempo). Understanding of SRE concepts (SLIs, SLOs, error budgets, capacity planning). Comfortable working in incident and problem management processes. Strong GitOps mindset for managing platform and More ❯
DevOps and Infrastructure teams to integrate automated environment spin-up/down to support rapid project delivery. Evaluate, select, and implement environment monitoring and reporting tools (e.g., AppDynamics, Splunk, Grafana, or custom dashboards). Capacity & Demand Management Implement predictive demand planning and capacity management to anticipate environment conflicts and avoid project delays. Own the environment utilisation dashboard for senior stakeholders … test environments in a mid-to-large size organisation. Strong knowledge of CI/CD, DevOps principles, and automated environment provisioning. Familiarity with monitoring tools such as AppDynamics, Splunk, Grafana, or similar. Good understanding of data compliance, security requirements, and environment governance. Ability to lead cross-functional teams, manage competing priorities, and influence stakeholders at all levels. Hands-on experience More ❯
Maintaining and evolving our cloud infrastructure (GCP, Kubernetes) to ensure high availability, security, and performance Managing service observability and reliability, including logging, metrics and alerting (we use Prometheus and Grafana) Handling database and service upgrades (e.g. MySQL, Kubernetes), secrets management and security best practices Taking ownership of platform-level concerns such as deployment pipelines, configuration management, and cost awareness Helping … across infrastructure and applications, including secrets management and credential rotation. Familiarity with infrastructure-as-code or automation tools is a plus Experience with observability tools (such as Prometheus and Grafana), service monitoring, and debugging in production environments A demonstrated interest in staying up-to-date with new technology, new frameworks, new languages and other developments like AI. A passion for More ❯
Are you a software engineer passionate about building the latest tools and shaping the future of software engineering? This YC-backed startup is revolutionising how developers interact with their tools by creating an AI-powered integrated development environment designed to More ❯