AWS. • Collaborate with development teams to integrate their applications into the infrastructure. • Monitor and troubleshoot production systems and resolve issues as necessary. • Continuously improve processes and tools to ensure highavailability and performance. • Stay current with new technologies and industry trends, continuously exploring new ways to improve our infrastructure. • Other duties as assigned. Minimum Qualifications • Security Clearance - A … a U.S. citizen. • 9+ years of experience in DevOps Engineering or Software Development and Bachelors in related field; or 7 years relevant experience with Masters in related field; or High School Diploma or equivalent and 13 years relevant experience. • Strong Knowledge of AWS services (EKS, EC2, EBS, S3, Lambda) and their application to deployment and management of infrastructure. • Proficient More ❯
Bristol, Gloucestershire, United Kingdom Hybrid / WFH Options
CACI Limited
protection, regulatory compliance, and alignment with industry best practices (e.g., AWS IAM, encryption, VPC, security monitoring, and auditing). • Containerisation & Orchestration: Architect and oversee containerised environments using Kubernetes, ensuring highavailability, scalability, and fault tolerance for critical applications. • Event-Driven Systems: Lead a team to architect event-driven systems using Kafka, designing and managing messaging frameworks to handle … Certified Security - Specialty) preferred. • Understanding of architectural standards and frameworks e.g. TOGAF Due to the industries we work in, we require the successful candidate to be able to obtain high level security clearance. To qualify for this, you must be a British citizen and have lived permanently in the UK for the last 5 years. Why work for us More ❯
the infrastructure engineers: VMware Infrastructure Design and deployment of large-scale Vmware environments Configuration and management of ESXi, vCenter, vSAN, and NSX Implementation of RBAC for Vmware access control Highavailability, disaster recovery, and backup strategies Operating Systems Deployment, configuration, and management of Linux (various distributions) Windows Server setup, including Active Directory, DNS, and Group Policy Linux repositories More ❯
recovery projects. People: Management and growth of engineers - through 1:1s, performance reviews and objectives, it is important for all that we are able to deliver work of a high standard, in a sustainable manner, and engineers are able to learn, develop and grow their skills and career. Collaborate closely with other engineering teams, product managers, and business leaders … to align infrastructure capabilities with business needs and growth. What we're looking for Technical Experience An expert in modern infrastructure technology with experience in high-availability cloud platforms for SAAS companies. Experience with our specific tech stack is preferred. Understanding of regulatory frameworks like GDPR, ISO27k etc. An advocate for AI technologies and constantly stays up to More ❯
containerisation (Docker, Kubernetes) and cloud platforms (AWS, GCP or Azure) Skilled in cross-functional collaboration and stakeholder communication Strong analytical skills with a proactive, problem-solving mindset Experience in high-availability systems, cybersecurity frameworks (ISO, SOC), or Elixir development Background in fast-paced, start-up or scale-up environments Interest in stepping into or growing towards an Engineering More ❯
Oversee the full lifecycle of enterprise applications from ideation to deployment and ongoing support. Design and ensure seamless integration of applications across various platforms and systems, ensuring interoperability and high availability. Work with technical teams to design integration points, leveraging APIs, microservices, and cloud platforms for efficient communication between disparate systems. Lead the design and implementation of cloud-native … applications or hybrid solutions leveraging platforms like AWS, Azure, or Google Cloud. Ensure application solutions are optimized for cloud environments, implementing strategies for application scalability, security, and high availability. Guide teams in the adoption of cloud-based services and platforms, ensuring seamless migration of applications to cloud infrastructure. Document the architecture of enterprise applications, including technical specifications, process flows More ❯
and workloads Lead infrastructure strategy for cloud migrations of insurance core systems, including on-prem to cloud transitions. Optimize cloud infrastructure using native services for performance, cost-efficiency, and high availability. Define best practices for cloud operations, monitoring, disaster recovery, and compliance. Insurance Application Cloud Enablement: Provide cloud infrastructure implementation support for core insurance platforms, including Guidewire applications (PolicyCenter More ❯
and workloads Lead infrastructure strategy for cloud migrations of insurance core systems, including on-prem to cloud transitions. Optimize cloud infrastructure using native services for performance, cost-efficiency, and high availability. Define best practices for cloud operations, monitoring, disaster recovery, and compliance. Insurance Application Cloud Enablement: Provide cloud infrastructure implementation support for core insurance platforms, including Guidewire applications (PolicyCenter More ❯
Manchester, Lancashire, United Kingdom Hybrid / WFH Options
Frontier Resourcing
play a pivotal role in designing and delivering mission-critical systems that power digital transformation for major organizations. As a Cloud DevOps Engineer, you will be part of a high-performing team working on high-availability platforms in regulated and fast-paced environments. You'll use your expertise in automation, cloud technologies, and DevOps practices to create … Develop and maintain Infrastructure as Code (IaC) using tools like Terraform or CloudFormation Implement and manage CI/CD pipelines, enabling rapid and reliable deployments Monitor systems for performance, availability, and security, using observability best practices Collaborate in Agile development teams, contributing to sprints, stand-ups, and continuous delivery cycles Troubleshoot infrastructure and deployment issues, delivering fast and sustainable More ❯
Reliability Engineering (SRE), DevOps, and traditional operations models to build a next-generation Reliability Engineering function. This role ensures end-to-end automation at scale, 24x7 operational excellence, and highavailability across all of BCG, including BCG Core, BCG X, and Consulting Team (CT) worldwide. The leader will drive strategic planning, execution, and optimization of global IT infrastructure … reliability, compute platforms, and cloud-native services across AWS, Azure, and GCP. Scale Infrastructure as Code (IaC), automated provisioning, and cloud workload optimization. Drive edge computing, containerized workloads, and high-performance computing strategies. Implement AI-driven monitoring, self-healing automation, and full-stack observability. IT Service Management & Operational Excellence: Mandate and assure the adoption of IT Service Management (ITSM … and effective service delivery. Establish SRE-based operational metrics, including SLOs, SLIs, and error budgets. Oversee incident response, problem resolution, and root cause analysis with AI-driven remediation. Ensure highavailability, performance, and security compliance for all enterprise services. Develop a follow-the-sun operational support model, ensuring 24x7 resilience and uptime across all of BCG. Optimize incident More ❯
manage continuous integration and continuous delivery (CI/CD) pipelines Automate infrastructure provisioning and configuration management using tools like Terraform or AWS CDK Monitor and optimize system performance, ensuring highavailability and scalability Collaborate with development and operations teams to streamline software delivery and deployment processes Implement and maintain monitoring and alerting systems to identify and resolve issues More ❯
San Francisco, California, United States Hybrid / WFH Options
FROG DESIGN
Responsibilities Design and implement cloud architecture for our organization Ensure scalability, security, and highavailability of our cloud systems Collaborate with development team to integrate applications with cloud infrastructure Develop and implement cloud migration strategies Ensure compliance with industry standards and best practices Troubleshoot and resolve cloud-related issues Skills Needed 5+ years of experience in cloud architecture More ❯
mentoring junior DBAs and providing technical leadership on database design, optimisation, and migration strategies. Essential Knowledge, Skills and Experience MySQL : Proficiency in MySQL replication (master-slave, master-master) and highavailability configurations. Experience in query performance optimisation, including slow query analysis, indexing strategies, and troubleshooting. Strong understanding of schema optimisation (e.g., normalisation, denormalisation, partitioning) to enhance database performance. … in managing MySQL upgrades and schema migrations in production environments, ensuring minimal downtime and data integrity. In-depth knowledge of replication techniques across the various database technologies to ensure highavailability, data consistency, and fault tolerance. Experience in setting up and maintaining multi-master replication, geo-replication, GTID and disaster recovery strategies. Proficient in resolving replication lag, failover … database schema changes and migrations to ensure controlled and tested deployments. Apache Druid/Column based databases : Familiarity with setting up and managing replication across Druid clusters, including data availability and data sharding strategies. Experience with query optimisation in Druid, especially for long-running queries in OLAP workloads. Understanding of schema design and optimisation for Druid's columnar data More ❯
Solace Messaging Administrator London 3x a week Full-Time Permanent Salary on application You will be responsible for managing and supporting our enterprise messaging infrastructure, ensuring highavailability, optimal performance, and reliability across production and non-production environments. This includes working on incident response, capacity planning, network optimization, and system observability using industry-standard monitoring tools. Required Skills … Azure, GCP) and cloud-native deployments. Why Join Us? Be part of a mission-critical team enabling real-time data flows. Work with cutting-edge technologies and contribute to high-impact projects. Eligo Recruitment is acting as an Employment Business in relation to this vacancy. Eligo is proud to be an equal opportunity employer dedicated to fostering diversity and More ❯
Solace Messaging Administrator London 3x a week Full-Time Permanent Salary on application You will be responsible for managing and supporting our enterprise messaging infrastructure, ensuring highavailability, optimal performance, and reliability across production and non-production environments. This includes working on incident response, capacity planning, network optimization, and system observability using industry-standard monitoring tools. Required Skills … Azure, GCP) and cloud-native deployments. Why Join Us? Be part of a mission-critical team enabling real-time data flows. Work with cutting-edge technologies and contribute to high-impact projects. Eligo Recruitment is acting as an Employment Business in relation to this vacancy. Eligo is proud to be an equal opportunity employer dedicated to fostering diversity and More ❯
responsible for the maintenance and optimization of our PostgreSQL databases hosted on the AWS Cloud platform. The successful candidate will collaborate with cross-functional teams to ensure database reliability, availability, and performance, while also contributing to the overall architecture and strategy of our cloud-based database solution. 1. Database Design and Architecture: • Collaborate with software engineers, DevOps, and infrastructure … and configure PostgreSQL database instances on AWS RDS, considering factors such as instance sizing, storage, and security. • Implement and manage replication, clustering, and backup/recovery strategies to ensure highavailability and disaster recovery. 3. Performance Monitoring and Optimization: • Monitor database performance, proactively identifying and resolving performance bottlenecks, slow queries, and other issues affecting system responsiveness. • Tune database More ❯
Washington, Washington DC, United States Hybrid / WFH Options
ClearanceJobs
Engineer (SRE). The selected candidate will support and maintain our customers' FedRAMP- compliant deployment in AWS GovCloud for public sector customers. The SRE will be responsible for ensuring highavailability, security, and compliance of cloud-based environments while driving automation, monitoring, and incident response best practices. U.S. Citizenship (required for working in GovCloud environments) Terms: Fulltime/… Strong troubleshooting, problem-solving, and automation mindset. Responsibilities/Impact as a SRE: • AWS GovCloud Operations: Manage and optimize cloud-based infrastructure in AWS GovCloud, ensuring FedRAMP compliance and high availability. • Reliability & Performance: Monitor and enhance system performance, scalability, and reliability through observability tools, automation, and best practices. • Security & Compliance: Implement and maintain security controls aligned with FedRAMP, NIST … closely with DevOps, security teams, developers, and federal stakeholders to maintain a compliant and secure cloud environment. Cross-Functional Leadership and Execution Beyond technical expertise, this role requires a high level of autonomy and ownership. The ideal candidate: • Leads end-to-end tasks with minimal oversight-from planning through execution and validation. • Is an all-around player who understands More ❯
core services. You will be responsible for driving technical excellence and innovation within the team while mentoring and coaching junior engineers. Responsibilities Design new features or enhancements based on high-level architectures. Work on a global, highly distributed infrastructure Highavailability, mission critical Java services Opportunity to modernize the technology stack Lead backlog grooming, planning, design reviews … REST/microservices. Familiarity with Jenkins CI/CD pipelines and Terraform. Experience with containerization technologies such as Docker and Kubernetes. Proficient in AWS services. Proven ability to build high-volume, scalable, distributed back-end services. Proficient in multithreaded software engineering, with expertise in designing, implementing, and troubleshooting concurrent systems to optimize performance and resource utilization Proficient in leveraging … AI tools, such as Cursor, to enhance software engineering practices, including automated design, code generation, and testing, ensuring high-quality and efficient development workflows. Excellent collaboration, influencing, negotiation, coaching, mentoring, and coalition-building skills. Strong verbal and written communication skills. A great team player with demonstrable experience delivering superior software products via Agile methodologies Continuous learning mindset, keeping abreast More ❯
are seeking a proactive and technically proficient individual to help architect, maintain, and optimise the core infrastructure that supports our mission-critical technology platforms. This individual will join a high-performing team tasked with designing and sustaining resilient, secure, and high-availability systems that meet rigorous uptime standards in a round-the-clock operational environment. The ideal More ❯
understanding of system architecture, and experience in troubleshooting complex technical issues. Key Responsibilities: Design, develop, and implement IT infrastructure, including hardware, software, and networks. Monitor system performance and ensure highavailability and reliability. Identify, diagnose, and resolve technical issues related to system operations. Implement security measures and best practices to protect systems and data. Collaborate with cross-functional More ❯
environments. Collaboration & Communication: Work closely with development and operations teams to streamline processes, enhance productivity, and solve complex deployment challenges. Monitoring & Optimization: Proactively monitor and optimize pipeline performance, ensuring highavailability, scalability, and security throughout the entire delivery pipeline. Automation & Efficiency: Continually seek out opportunities to automate manual processes, reduce friction in deployment, and improve operational efficiency. Security More ❯
as Terraform, Ansible, and CloudFormation Create, manage, and secure cloud-hosted OS images (e.g., AWS AMIs) across Linux and Windows Server environments Configure and troubleshoot production-grade systems, ensuring highavailability and performance in AWS and hybrid environments Implement and monitor cloud-native security controls, ensuring compliance with DoD cybersecurity standards Collaborate with cross-functional teams to support More ❯
San Antonio, Texas, United States Hybrid / WFH Options
EXPANSIA
monitoring using Infrastructure-as-Code (e.g., Terraform, CloudFormation). Manage identity and access control, patching, logging, and backups across multi-tenant environments. Troubleshoot platform and infrastructure issues and ensure highavailability of critical workloads. Collaborate with mission teams and platform users to understand needs and optimize cloud-native services. Support compliance efforts (e.g., RMF, Zero Trust Architecture, IL5+ More ❯
with a wide array of asset issuers. As a well-established market maker, our distinctive expertise led us to expand rapidly. Today, our services span market making, options trading, high-frequency trading, OTC, and DeFi trading desks. But we’re more than a service provider. We’re an initiator. We're pioneers in adopting the Rust development language for … and trading data platform systems at the core of our organisation. We are looking for a hands-on leader who is not only experienced in building scalable, resilient, and high-performance systems but also willing to roll up their sleeves and actively contribute to engineering efforts. The ideal candidate thrives in fast-paced environments, has a strong track record … managing and mentoring engineers, fosters a collaborative work culture, and drives product-centric initiatives while staying deeply engaged in technical challenges. Key Responsibilities Architect, develop, and maintain large-scale, high-performance trading data platforms with a focus on low latency and high availability. Apply data engineering principles to design efficient, scalable, and fault-tolerant data pipelines for trading More ❯
Brussels - Hybrid Your role: Design, build, and manage scalable, secure cloud infrastructure using infrastructure-as-code tools (e.g., Terraform, Helm). Design and maintain OpenShift clusters to ensure scalability, highavailability, and security Develop and maintain CI/CD pipelines to automate testing, deployment, and infrastructure provisioning. Implement containerization and orchestration solutions (e.g., Docker, Kubernetes) to support microservices … and cloud-native applications. Monitor system performance, ensure highavailability, and troubleshoot production issues across cloud environments. Your background: Bachelor's or higher degree in IT, Computer Science, or other related fields 5+ Years' experience in Cloud Computing (AWS, GCP, Azure, IBM) with relevant certifications Experience in developing CI/CD pipelines, and knowledge of DevOps tools including More ❯