Out in Science, Technology, Engineering, and Mathematics
about performance, security, and process interactions in complex distributed systems. Experience with version control, continuous integration, deployment, and configuration management tools in a DevOps environment. Experience meeting demands for highavailability and scale. Ability to communicate technical concepts effectively, both written and orally, as well as the interpersonal skills required to collaborate effectively with colleagues across diverse technology … teams. Ability to rapidly and effectively understand and translate requirements into technical solutions. Preferred Qualifications: Experience working in a Linux environment, including system engineering, highavailability design, performance analysis, network troubleshooting. Knowledge of container technologies: Docker and Kubernetes. Hands on experience in Amazon web services. Experience using infrastructure as code tools (e.g. Terraform) Experience with at least one More ❯
be crucial in ensuring the seamless operation of our applications, DevOps, middleware, security, and infrastructure components. Key Responsibilities : Provide 24/7 technical support for cloud-based solutions, ensuring highavailability and performance across various applications and infrastructure components. Design, build, and maintain infrastructure and configuration as code using tools like Ansible and Terraform. Administer Dev, Test, and More ❯
tests to identify and remediate bottlenecks Develop and maintain platform solutions, automate infrastructure provisioning, configuration, and management tasks using Infrastructure as Code. Monitor, review and tune databases to ensure highavailability and performance Collaborate with product engineering teams to design/build fit-for-purpose and observable software Required Skills and Experience: Proven experience in a SRE/… e.g., Certified Kubernetes Administrator) are a plus Experience in database management/performance tuning, particularly MSSQL. Employee benefits: Opportunity to be a part of a 30+ year well-established, high-performance SaaS company. Excellent Company Pension scheme and Life Insurance, Excellent holiday allowance. A supportive team environment with emphasis on learning and development opportunities Working with a team of … caring, high-performing, and passionate people who have fun supporting our vision, innovation, and continuous improvement. This Senior Site Reliability Engineer role is working for a market leading global software company and this job is part of a large program of change and improvement in their Cloud SaaS products over the coming years. If you are looking for an More ❯
Morgan Hunt are seeking an experienced Site Reliability Engineer (SRE)/Unix Infrastructure Engineer to support the deployment, migration, and optimisation of critical infrastructure services. The role involves ensuring highavailability, disaster recovery readiness, and automation-driven improvements across RHEL, Oracle DB, Kubernetes, and AWS environments . Key Responsibilities Infrastructure & Deployment Support migration and deployment of services to More ❯
Morgan Hunt are seeking an experienced Site Reliability Engineer (SRE)/Unix Infrastructure Engineer to support the deployment, migration, and optimisation of critical infrastructure services. The role involves ensuring highavailability, disaster recovery readiness, and automation-driven improvements across RHEL, Oracle DB, Kubernetes, and AWS environments . Key Responsibilities Infrastructure & Deployment Support migration and deployment of services to More ❯
platform. Key Responsibilities Architectural Design in cloud based environments: Develop and implement robust IT architecture strategies for cloud and hybrid environments, leveraging AWS best practices. Design scalable, secure, and high-availability solutions tailored to business needs. Architect and optimize data platforms to enable efficient data collection, storage, and processing. Implement and manage cloud-native services, including compute, storage More ❯
closely with development, security, and operations teams, applying DevOps methodologies to streamline processes and enhance system reliability. Performance Optimization : Expertise in tuning cloud applications for cost efficiency, scalability, and highavailability , leveraging Azure Autoscaling, Load Balancers, andTraffic Manager . At least 5 years of hands-on experience in Azure Hyperscale/DevOps. Over 10 years of experience in More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Talent Hero Ltd
AWS, Azure, or GCP. If youre skilled at turning business needs into technical cloud solutions we encourage you to apply. Remote cloud architecture roles are both in demand and high-impact, and our service is 100% free for all UK applicants. Applying through Talent Hero gives you direct access to US companies needing cloud leaders ready to make an … Apply once we do the legwork. Your profile is matched to multiple US clients hiring for your skill set. Fast hiring process Responsibilities Design and implement scalable, secure, and high-availability cloud infrastructure Collaborate with engineering, DevOps, and security teams to define architecture best practices Automate infrastructure deployment using IaC tools and CI/CD pipelines Monitor performance More ❯
Kafka and Kubernetes Platform Management: Design, deploy, and maintain scalable Kafka and Kubernetes clusters to support development and production environments Implement best practices for Kafka and Kubernetes operations , ensuring highavailability, performance, and security Monitor, troubleshoot, and optimize Kafka and Kubernetes infrastructure to meet development team needs Implementation: Implement cloud infrastructure components, including compute, storage, networking, and security … for performance, scalability, and cost-efficiency Implement DevOps practices for streamlined deployment and operations Troubleshooting and Support: Provide technical support for cloud infrastructure and services Troubleshoot and resolve performance, availability, and security issues Support production environments and participate in a 24x7 on-call rotation when required Requirements: Experience 7+ years of experience in designing, implementing, and managing cloud-based More ❯
implement robust, scalable, and secure data pipelines and platforms tailored to the needs of Natural Gas and Power trading, including real-time market data ingestion, time-series storage, and high-frequency analytics. Lead the design and governance of data models that support complex trading strategies, asset optimization, and regulatory reporting. Ensure data quality, lineage, and observability across all layers … excellence, learning, and continuous improvement. Review code, enforce best practices, and ensure adherence to architectural standards and security protocols. Operational Excellence Monitor and optimize performance of data systems, ensuring highavailability and resilience in a fast-paced trading environment. Lead incident response and root cause analysis for data-related issues, implementing preventive measures. Maintain documentation and knowledge repositories More ❯
solutions and implementations Experience implementing developer self-service/developer experience portals Strong experience of application modernisation and cloud migration programs Strong Linux and Windows server experience in a high-availability 24/7 operation Experience with the development and deployment of large-scale, complex technology platforms Deep understanding of GCP products across database, serverless, containerization and API … Advanced level expertise in Terraform Extensive experience in designing and implementing DevOps practices Experience with two or more CI/CD solutions Experience coaching and mentoring high-performing teams Pragmatic experience using agile to deliver incremental value Experience working in a global or multinational team setting Strong documentation, communication and collaboration skills Proven ability to drive innovation and continuous More ❯
wide range of AWS native databases including RDS, Aurora, Neptune, as well as CockroachDB. Your daily responsibilities will involve designing robust software solutions that enhance system performance while ensuring highavailability for critical applications. You will work hand-in-hand with product engineering teams to improve observability tools and telemetry systems, driving forward automation initiatives that reduce manual More ❯
and implementation of our Site Reliability Engineering (SRE) program. The ideal candidate will ensure the reliability, scalability, performance, and security of Writer's critical systems, proactively guaranteeing that our high-ROI products reach customers seamlessly. Your responsibilities: Lead the design, implementation, and maintenance of Writer, Inc.'s cloud infrastructure to ensure highavailability and performance. Design and … reliability practices. Is this you? Proven expertise in Site Reliability Engineering with at least 7 years of hands-on experience. Deep understanding of system architecture and infrastructure design for highavailability and performance. Bachelor's degree in Computer Science, Engineering, or a related field. Strong proficiency in programming languages such as Python, Java, or Go for automation and More ❯
party services. Oversee API development with product owner and ensure best practices in service-oriented architecture. Team Leadership & Collaboration: Work closely with engineering, DevOps, and support teams to deliver high-quality solutions. Facilitate agile ceremonies, including backlog grooming, sprint planning, and retrospectives. Act as the primary liaison between technical teams and business stakeholders. Operational Excellence & Continuous Improvement: Ensure highavailability and reliability of the platform and applications, implementing monitoring and automation as needed. Identify areas for improvement and drive initiatives for performance optimization. Maintain compliance with security, data protection, and industry standards. Vendor relationship management: Manage the relationship with vendor(s) and hold them contractually accountable for all services provided. Qualifications Required Qualifications: Education & Experience: Bachelor's More ❯
containerisation (Docker, Kubernetes) and cloud platforms (AWS, GCP or Azure) Skilled in cross-functional collaboration and stakeholder communication Strong analytical skills with a proactive, problem-solving mindset Experience in high-availability systems, cybersecurity frameworks (ISO, SOC), or Elixir development Background in fast-paced, start-up or scale-up environments Interest in stepping into or growing towards an Engineering More ❯
and workloads Lead infrastructure strategy for cloud migrations of insurance core systems, including on-prem to cloud transitions. Optimize cloud infrastructure using native services for performance, cost-efficiency, and high availability. Define best practices for cloud operations, monitoring, disaster recovery, and compliance. Insurance Application Cloud Enablement: Provide cloud infrastructure implementation support for core insurance platforms, including Guidewire applications (PolicyCenter More ❯
Company Profile - Managed Cloud Service Provider in the M+E space As the Head of Engineering, you will be responsible for overseeing the technical operations of managed cloud services, ensuring high- quality delivery and exceptional customer experiences. You will lead a team of skilled DevOps and Cloud Systems Engineers, manage escalations and complex situations, and collaborate with cross-functional teams … on, player-coach leadership role requiring both strategic vision and deep technical expertise in cloud computing and AWS services. Key Responsibilities Leadership & Team Management: Lead, mentor, and grow a high-performing team of DevOps and Cloud Systems Engineers. Foster a culture of continuous improvement, collaboration, and accountability within the engineering team. Develop and execute strategies for team performance, professional … development, and succession planning. Technical Strategy & Operations: Oversee the design, implementation, and maintenance of AWS environments and our multi- cloud infrastructure services. Ensure robust architecture, highavailability, scalability, and security of managed AWS accounts. Implement and refine DevOps best practices, automation, and CI/CD pipelines to enhance service delivery. Own resource management and planning to ensure suitable More ❯
Company Profile - Managed Cloud Service Provider in the M+E space As the Head of Engineering, you will be responsible for overseeing the technical operations of managed cloud services, ensuring high- quality delivery and exceptional customer experiences. You will lead a team of skilled DevOps and Cloud Systems Engineers, manage escalations and complex situations, and collaborate with cross-functional teams … on, player-coach leadership role requiring both strategic vision and deep technical expertise in cloud computing and AWS services. Key Responsibilities Leadership & Team Management: Lead, mentor, and grow a high-performing team of DevOps and Cloud Systems Engineers. Foster a culture of continuous improvement, collaboration, and accountability within the engineering team. Develop and execute strategies for team performance, professional … development, and succession planning. Technical Strategy & Operations: Oversee the design, implementation, and maintenance of AWS environments and our multi- cloud infrastructure services. Ensure robust architecture, highavailability, scalability, and security of managed AWS accounts. Implement and refine DevOps best practices, automation, and CI/CD pipelines to enhance service delivery. Own resource management and planning to ensure suitable More ❯
Solace Messaging Administrator London 3x a week Full-Time Permanent Salary on application You will be responsible for managing and supporting our enterprise messaging infrastructure, ensuring highavailability, optimal performance, and reliability across production and non-production environments. This includes working on incident response, capacity planning, network optimization, and system observability using industry-standard monitoring tools. Required Skills … Azure, GCP) and cloud-native deployments. Why Join Us? Be part of a mission-critical team enabling real-time data flows. Work with cutting-edge technologies and contribute to high-impact projects. Eligo Recruitment is acting as an Employment Business in relation to this vacancy. Eligo is proud to be an equal opportunity employer dedicated to fostering diversity and More ❯
Solace Messaging Administrator London 3x a week Full-Time Permanent Salary on application You will be responsible for managing and supporting our enterprise messaging infrastructure, ensuring highavailability, optimal performance, and reliability across production and non-production environments. This includes working on incident response, capacity planning, network optimization, and system observability using industry-standard monitoring tools. Required Skills … Azure, GCP) and cloud-native deployments. Why Join Us? Be part of a mission-critical team enabling real-time data flows. Work with cutting-edge technologies and contribute to high-impact projects. Eligo Recruitment is acting as an Employment Business in relation to this vacancy. Eligo is proud to be an equal opportunity employer dedicated to fostering diversity and More ❯
responsible for the maintenance and optimization of our PostgreSQL databases hosted on the AWS Cloud platform. The successful candidate will collaborate with cross-functional teams to ensure database reliability, availability, and performance, while also contributing to the overall architecture and strategy of our cloud-based database solution. 1. Database Design and Architecture: • Collaborate with software engineers, DevOps, and infrastructure … and configure PostgreSQL database instances on AWS RDS, considering factors such as instance sizing, storage, and security. • Implement and manage replication, clustering, and backup/recovery strategies to ensure highavailability and disaster recovery. 3. Performance Monitoring and Optimization: • Monitor database performance, proactively identifying and resolving performance bottlenecks, slow queries, and other issues affecting system responsiveness. • Tune database More ❯
the ability to work independently, take initiative, and make their own decisions. Main Responsibilities Design and develop live casino games using Java, Kotlin, Scala and the Spring framework, ensuring high performance and quality standards. Create scalable and maintainable microservices architecture for game components. Deploy and manage game services on Kubernetes clusters, optimizing resource allocation and ensuring high availability. More ❯
in automation and operations. As part of the AWS Managed Operations team, you will play a pivotal role in building and leading operations and development teams dedicated to delivering high-availability AWS services, including EC2, S3, Dynamo, Lambda, and Bedrock, exclusively for EU customers. For more information on ESC please check out our blog: Your responsibilities will encompass … of AWS services and technology. A typical day in this role involves collaborating with technology leaders, contributing to the enhancement of day-to-day operations, and ensuring improvements in availability, reliability, latency, performance, and efficiency of the ESC. You will be required to occasionally participate in "on-call" rotations to resolve incidents occurring out-of-hours. The overarching goal … is to deliver scalable services and ensure a high-availability experience for EU customers. If you are an experienced professional ready for a challenging and impactful opportunity, we invite you to join our efforts in building a best-in-class development engineering and operations team that aligns with AWS' commitment to customer satisfaction and continual innovation. Utility Computing More ❯
with a primary focus on MongoDB. Your mission is to lead database administration efforts, define the MongoDB roadmap, and collaborate with IT Operations and other stakeholders to ensure the availability, performance, and security of our database systems. We are looking for a candidate who is passionate about database technologies and values collaboration, innovation, and continuous learning. You should have … you to: Lead the design, implementation, and maintenance of MongoDB database systems. Develop and enforce database security measures, policies, and best practices. Monitor and optimize database performance to ensure highavailability, scalability, and efficient resource utilization. Collaborate with development teams on database-related activities, including schema changes, data migrations, and performance tuning. Troubleshoot and resolve complex database issues … maintain robust backup and recovery strategies to ensure data integrity and recoverability. Plan and execute database upgrades, patches, and migrations. Implement and maintain database replication and clustering technologies for highavailability and disaster recovery. Document database configurations, procedures, and troubleshooting steps. Stay current with the latest database technologies, industry trends, and best practices. Mentor and provide guidance to More ❯
responsible for blendingSite Reliability Engineering (SRE), DevOps, and traditional operations modelsto build a next-generationReliability Engineering function. This role ensuresend-to-end automation at scale, 24x7 operational excellence, and high availabilityacrossall of BCG, includingBCG Core, BCG X, and Consulting Team (CT) worldwide. The leader will drivestrategic planning, execution, and optimizationof global IT infrastructure, cloud operations, and service management while … BCG business units. Managenetwork reliability, compute platforms, and cloud-native servicesacross AWS, Azure, and GCP. ScaleInfrastructure as Code (IaC),automated provisioning, andcloud workload optimization. Driveedge computing, containerized workloads, and high-performance computing strategies. ImplementAI-driven monitoring, self-healing automation, and full-stack observability. IT Service Management & Operational Excellence: Mandate and assure the adoption of IT Service Management (ITSM) processes … ensuring standardized, efficient, and effective service delivery. EstablishSRE-based operational metrics, includingSLOs, SLIs, and error budgets. Overseeincident response, problem resolution, and root cause analysis with AI-driven remediation. Ensurehigh availability, performance, and security compliancefor all enterprise services. Develop afollow-the-sun operational support model, ensuring24x7 resilience and uptime across all of BCG. Optimizeincident, change, and capacity management, ensuring alignment More ❯