and continuous improvement of Solace environments, working across development, infrastructure, and cloud teams to deliver a stable and well-governed messaging service. You will troubleshoot problems, refine configurations, improve observability, and help drive upgrades, automation, and improved resilience. Experience Needed • At least 1 year of hands-on experience configuring, administering, and troubleshooting Solace PubSub+• Strong understanding of event-driven and More ❯
maintain scalable, automated, reliable data pipelines across a modern cloud stack Extend and improve a cutting-edge Data & Analytics Platform supporting mission-critical insurance products Implement data quality checks, observability metrics and troubleshooting processes Manage cloud resources via Infrastructure-as-Code Ensure strong data security, access control, and governance Work closely with commercial, analytics and engineering teams to deliver high More ❯
security and compliance by implementing CIAM flows, and adhering to ISO 27001 standards. Develop resilient architectures for retail and e-commerce systems, considering networking and SD-WAN performance. Configure observability tools for monitoring, logging, and performance metrics. Mentor and guide a small technical team, enforce coding standards, and apply Agile principles. Translate business objectives into technical solutions for e-commerce More ❯
Enfield, Middlesex, England, United Kingdom Hybrid/Remote Options
Crimson
security and compliance by implementing CIAM flows, and adhering to ISO 27001 standards. Develop resilient architectures for retail and e-commerce systems, considering networking and SD-WAN performance. Configure observability tools for monitoring, logging, and performance metrics. Mentor and guide a small technical team, enforce coding standards, and apply Agile principles. Translate business objectives into technical solutions for e-commerce More ❯
agent systems, integrating them with core enterprise systems like SAP, Salesforce, and the ECOLAB3D™ platform. Define and enforce architectural standards and governance frameworks for the agent lifecycle, data lineage, observability, and interoperability. Technology Evaluation and Selection: Evaluate and select AI platforms, tools, and protocols, such as LangChain, AutoGen, or similar frameworks, ensuring they meet scalability, security, and performance requirements within More ❯
C++, Rust, or Go is highly beneficial. Expert knowledge of either Rust or C++, Experience in designing, implementing, and maintaining reliable and horizontally scalable distributed systems, Knowledge of service observability and reliability best practices, Experience in operating commonly used databases such as PostgreSQL, Clickhouse, and MongoDB. Additionally, any of the below points will help a candidate stand out: Expert knowledge More ❯
Central Limit Order Book (CLOB), entirely on an EVM-compatible chain You will develop and maintain bare-metal and cloud environments, service orchestration, network connectivity, databases, blockchain nodes and observability to the highest standards of reliability, performance and security. Although this project centres around a blockchain system, previous experience with blockchain is not a hard requirement but keen interest in More ❯
optimization, anomaly detection, and predictive analytics. Understanding of AI frameworks and libraries (e.g., TensorFlow, PyTorch, Scikit-learn) and their application in network automation and monitoring. Experience with telemetry and observability frameworks (e.g., Prometheus, Grafana) for real-time network monitoring and troubleshooting. Experience : Minimum of 7 years' of experience in network engineering, operations, and support. Proven ability to work hands-on More ❯
jobs, retries, monitoring, automation). Work with S3-style object storage: efficient layouts, lifecycle, throughput, and cost awareness. Add tooling around pipelines (progress/health visualization, metrics, alerts) for observability and faster iteration. Collaborate closely with ML engineers to align datasets with training needs and accelerate experimentation. Requirements Must-have Strong Python fundamentals; you write clean, maintainable, production-ready code. More ❯
overseeing the full AI solution lifecycle — from concept to production. Mentor and guide Lead AI Engineers and technical contributors to raise the engineering bar. Champion best practices for governance, observability, and lifecycle management of AI systems. Evaluate and introduce emerging frameworks, including Autogen, LangGraph, and other agentic ecosystems. Partner with senior stakeholders to align AI strategy with business goals. Drive More ❯
will: Design and evolve the architecture of highly scalable, reliable, and secure distributed systems. Drive technical excellence across the engineering organization by setting standards for code quality, system design, observability, and operational best practices. Collaborate closely with Product, UX, and Application Engineering teams to deliver impactful features while ensuring architectural soundness and scalability. Mentor and guide senior and mid-level More ❯
other internal teams to fully understand client requirements and deliver tailored technical solutions. Design and implement scalable, future-proof architectures for new third-party connectors and integrations. Enhance system observability by improving diagnostics, logging, and tracing to aid technical support teams in resolving issues swiftly. Oversee the ongoing development and management of the public API, covering REST and event streaming More ❯
other internal teams to fully understand client requirements and deliver tailored technical solutions. Design and implement scalable, future-proof architectures for new third-party connectors and integrations. Enhance system observability by improving diagnostics, logging, and tracing to aid technical support teams in resolving issues swiftly. Oversee the ongoing development and management of the public API, covering REST and event streaming More ❯
you thrive in a fast-paced environment where you can make a real difference, we want to hear from you! Required skills/expertise: Develop and implement a comprehensive observability strategy for self-hosted deployments, including infrastructure and tooling for monitoring, alerting, and troubleshooting. This will involve designing and implementing robust metrics and logging systems. Engineer the ACRA platform for More ❯
applications of AI for the construction domain, pushing the boundaries of what's possible. Build core infrastructure that allows us to build and ship LLM apps quickly - this includes observability, how we work with several LLM providers + our own fine tuned models. Work with other engineers in the product and research teams to bring new models and applications to More ❯
international markets Previous experience in the parking or mobility sector Experience with GraphQL and modern API integration patterns Knowledge of micro-frontend architectures Experience with advanced performance monitoring and observability tools Growth Opportunities Opportunity to shape the frontend strategy for a rapidly growing international company Increasing involvement in strategic technical decision-making Development of broader technology leadership skills Experience in More ❯
DevOps, infrastructure, and platform engineering. Tech Stack Cloud: AWS (EC2, RDS, S3, IAM, CloudWatch, Lambda) Infrastructure as Code: Terraform Containerisation & Orchestration: Docker, Kubernetes (EKS), Helm Configuration Management: Ansible Monitoring & Observability: Grafana, Prometheus CI/CD: GitHub Actions Automation & Scripting: Python, Bash, Go or Java What We’re Looking For Proven experience running AWS cloud infrastructure in a production or regulated … financial) environment. Hands-on experience managing Kubernetes clusters (preferably EKS). Strong understanding of Infrastructure as Code using Terraform. Familiarity with monitoring and observability stacks such as Prometheus and Grafana. Experience building and maintaining CI/CD pipelines (GitHub Actions or similar). Strong scripting or automation skills using Python, Bash, Go or Java . A collaborative mindset — comfortable working More ❯
AWS (Core Services – EC2, RDS, S3, IAM, Lambda, CloudWatch) Infrastructure as Code: Terraform Containerisation & Orchestration: Docker, Kubernetes (EKS), Helm Configuration Management: Ansible CI/CD Pipelines: GitHub Actions Monitoring & Observability: Grafana, Prometheus Scripting/Automation: Python or Java What We’re Looking For Proven experience managing and scaling AWS cloud environments , ideally supporting live software products or high-traffic platforms. … Strong background in Terraform and Infrastructure as Code best practices. Practical experience with Kubernetes (EKS) in production. Familiarity with monitoring and observability tools such as Grafana and Prometheus. Hands-on experience building CI/CD pipelines (GitHub Actions, Jenkins, CircleCI, etc.). Solid scripting and automation experience using Python or Java . A collaborative engineer who enjoys working closely with More ❯
london, south east england, united kingdom Hybrid/Remote Options
Black Pen Recruitment
tooling, systems design, and operational resilience. Their environment offers opportunities to work on everything from CI/CD pipelines and container orchestration to configuration management, infrastructure as code, and observability tooling. While you may bring experience in specific tools or platforms, you will be expected to contribute broadly across our infrastructure landscape. Our client's core product is a comprehensive … Solid Linux administration and general networking knowledge Understanding of infrastructure security best practices, including secure configuration, identity and access management, and compliance controls Experience with monitoring, alerting, and system observability Background in financial services infrastructure is advantageous but not required More ❯
pipelines, reducing deployment time and improving release reliability Strengthen system resilience through infrastructure improvements and scalability planning Work with Product Engineer's to enhance developer experience Drive automation and observability Requirements: Strong GCP experience Deep understanding of Terraform CI/CD pipelines Containerisation (Kubernetes, GKE) If you're interested get in touch ASAP More ❯
to build cost-effective solutions on Microsoft Azure while maintaining agility and fostering innovation. This position is perfect for engineers who are passionate about optimising cloud usage, enhancing cost observability, and championing a Fin Ops culture. Experience in some of the following would be ideal Partner with engineering, finance and product teams to drive cost-efficiency across Azure Clear understanding More ❯
Monitor and optimise network performance across cloud and on-premise environments Troubleshoot and resolve connectivity issues quickly and effectively Automate network configuration using Terraform, PowerShell and Azure CLI Maintain observability using Azure Monitor, Log Analytics and Network Watcher Ensure deployments align with security and compliance standards Produce technical documentation and support knowledge sharing Required Experience: Strong hands-on experience with More ❯
City of London, London, United Kingdom Hybrid/Remote Options
ARC IT Recruitment Ltd
/MTTR via automation, clear SLAs, and robust RCAs/post-mortems. Safer, faster releases (blue/green, canary, feature flags) in partnership with Trading, Quant, and Engineering. Mature observability (logs/metrics/traces), capacity planning, and performance tuning for low-latency flows. Strong production hygiene and controls aligned to MiFID II/MAR/best-ex. Leadership of More ❯