cloud - object, file, backup, archive. Can demonstrate having delivered cloud storage-based solutions in a project context having been accountable for technical delivery and quality. Knowledge of DevOps and SRE delivery models - being able to work in a team or squad, adopting agile methods and ways of working. Knowledge of modern application hosting platforms beneficial (docker, Kubernetes and/or More ❯
Lead SiteReliability Engineer (Lead SRE) Ready to keep things running smoothly? Join our tombola team! At tombola, we pride ourselves on building our own exceptional games and platforms in-house. That means keeping everything running flawlessly is paramount! We're seeking a Lead SiteReliability Engineer (SRE) to join us and help ensure our critical … systems and services are always reliable, available, and performing at their best. What will yo u be doing? As an SRE, you'll be instrumental in implementing automation, monitoring, and incident response strategies to minimize downtime and optimize our operations. You'll collaborate closely with our development, infrastructure, and security teams, balancing exciting new feature delivery with rock-solid system … with our broader business objectives. Collaborating with other teams and departments to achieve shared success. Partnering with our People Partner for tech to build robust team management practices. System Reliability and Availability Ensure system uptime: Monitor and maintain the availability and reliability of critical systems and services, meeting all uptime SLAs (Service Level Agreements). Incident management: Quickly More ❯
Job Requirements Designing, building, and operating large-scale production systems Deep knowledge of Python is preferred, though other languages like Java, Go, Rust, or similar will also be heavily considered Experience using source control (Git, GitHub) and feature branching strategies More ❯
Cambourne, Cambridgeshire, United Kingdom Hybrid / WFH Options
Remotestar
AI-driven innovation. With a strong global presence, the company partners with enterprises across various industries to deliver cutting-edge technology solutions. Job Title: SiteReliability Engineer (SRE) Experience: 5 to 9 years Location: Pan India (Remote) Work Mode: Initially remote for this project. Later, the client will transition to a hybrid model (3 days from office per … fix potential issues before they cause problems for users. Monitor systems and create plans for responding to incidents. Involved in capacity planning and performance tuning to ensure that the site can handle increased traffic without issue. Deep understanding of how distributed systems work in order to be able to troubleshoot and optimize them. Familiar with various monitoring tools such … performance issues. Experience tools include monitoring tools, configuration management tools, and automation tools. Having good experience in Azure, GCP Below points to be noted: candidate who can mature their SRE practice across the division. Someone who is comfortable being a champion and leader in the SRE space. More ❯
This is a Vice President position within Platform ReliabilityEngineering and Management leveraging SRE Principles and Practices. This role is looking for a multi skilled professional with strong technical leadership, people management skills to deliver critical services ensuring Jefferies operates a highly stable, reliable, and resilient front-to-back plant. Responsibilities: Provide technical leadership to high performing global … will identify and create automation to eliminate manual day to day support activities; scope and create automation for deployment, management and visibility of our services. Extensive experience with implementing SRE principles in the organization such as SLOs/SLIs and TOIL measurement Implement best practices for building successful monitoring and alerting systems. Experience with Observability platforms like Datadog and open … telemetry is desired. You will work closely with engineering/development teams to design, build, and maintain systems and help them decide on products to use, schema design and query tuning. Extensive troubleshooting abilities across the stack QUALIFICATIONS/Technical Skills: Bachelor's degree or equivalent, ideally in an area related to technology infrastructure (e.g., engineering, computer science More ❯
This is a Vice President position within Platform ReliabilityEngineering and Management leveraging SRE Principles and Practices. This role is looking for a multi skilled professional with strong technical leadership, people management skills to deliver critical services ensuring Jefferies operates a highly stable, reliable, and resilient front-to-back plant. Responsibilities: Provide technical leadership to high performing global … will identify and create automation to eliminate manual day to day support activities; scope and create automation for deployment, management and visibility of our services. Extensive experience with implementing SRE principles in the organization such as SLOs/SLIs and TOIL measurement Implement best practices for building successful monitoring and alerting systems. Experience with Observability platforms like Datadog and open … telemetry is desired. You will work closely with engineering/development teams to design, build, and maintain systems and help them decide on products to use, schema design and query tuning. Extensive troubleshooting abilities across the stack QUALIFICATIONS/Technical Skills: Bachelor's degree or equivalent, ideally in an area related to technology infrastructure (e.g., engineering, computer science More ❯
This is a Vice President position within Platform ReliabilityEngineering and Management leveraging SRE Principles and Practices. This role is looking for a multi skilled professional with strong technical leadership, people management skills to deliver critical services ensuring Jefferies operates a highly stable, reliable, and resilient front-to-back plant. Responsibilities: Provide technical leadership to high performing global … will identify and create automation to eliminate manual day to day support activities; scope and create automation for deployment, management and visibility of our services. Extensive experience with implementing SRE principles in the organization such as SLOs/SLIs and TOIL measurement Implement best practices for building successful monitoring and alerting systems. Experience with Observability platforms like Datadog and open … telemetry is desired. You will work closely with engineering/development teams to design, build, and maintain systems and help them decide on products to use, schema design and query tuning. Extensive troubleshooting abilities across the stack QUALIFICATIONS/Technical Skills: Bachelor's degree or equivalent, ideally in an area related to technology infrastructure (e.g., engineering, computer science More ❯
Head of SiteReliabilityEngineering Investment Tech Annual Bonus, Share Options, L&D Fund, Private Medical, Hybrid/Flexi Working Do you want to … work for one of the country's top Investment Tech firms, who are currently undergoing a huge digital transformation? Do you want to spearhead the building of a new SRE function within the companies App Support and Infrastructure teams? I am partnered with one of the World s largest independent hedge fund and investment technology firms, who are currently on … the hunt for a Head of SRE to build out a new 24/7 reliability service for them . This is a strategic leadership position that will require merging 3 existing teams (App Support, Infra, DevOps) and line managing team of circa 20 headcount. Specifically, they want someone who has got experience building out SRE capabilities within App More ❯
the future? Let's talk. About the role The Production Support Analyst is responsible for triaging and resolving incidents and defects in production environments. This role ensures the stability, reliability, and performance of business-critical systems by providing timely support, collaborating with engineering and product teams, and following established defect management processes. This position has a strong focus … the defect backlog, ensuring all required information is documented (title, summary, classification, impacted journey, replication steps, screenshots, error messages, etc.). Collaborate with SiteReliabilityEngineering (SRE), scrum teams, and product teams to resolve production defects. Prioritize and manage defects according to severity (Critical, High, Medium, Low). Allocate and track sprint capacity for production defect resolution … technical and non-technical audiences. Excellent communication and collaboration skills. Ability to work under pressure and manage multiple priorities. Desirable Experience working in Agile/Scrum environments. Familiarity with SRE practices and production maintenance planning. Essential Experience with financial services, investment platforms, or enterprise software environments. Prior experience in a TA support, middle-office, or back-office technology role. Private More ❯
A leading financial institution seeks a Principal Network Engineer Position Summary As a highly skilled principal network engineer, you will be responsible for the ongoing support and reliability of all components within the company's ecosystem, encompassing platforms, networks, applications, and services. In this role, you will provide escalation support for important network services, ensuring the operational stability and … performance of the company's global network. You will also find opportunities and improvements, then define requirements for software automation to enhance the features, functions and reliability of the network, underpinning company's digital platforms. Role Responsibilities We will play a pivotal role in designing, building, and maintaining systems, software, and applications across various domains, ensuring the highest quality … RSVP. Comprehensive knowledge of multicast concepts, including PIM-SM, PIM-SSM, IGMP, MVPN, and MLDP. Proficiency in product-aligned, service-focused work, with a solid grasp of network automation, SRE, and DevOps or equivalent experience. Working with Agile methodologies (Scrum, Kanban) and project management tools like JIRA. Excellent skills in network packet analysis and packet capture tools for troubleshooting. Familiarity More ❯
A leading financial institution seeks a Principal Network Engineer Position Summary As a highly skilled principal network engineer, you will be responsible for the ongoing support and reliability of all components within the company's ecosystem, encompassing platforms, networks, applications, and services. In this role, you will provide escalation support for important network services, ensuring the operational stability and … performance of the company's global network. You will also find opportunities and improvements, then define requirements for software automation to enhance the features, functions and reliability of the network, underpinning company's digital platforms. Role Responsibilities We will play a pivotal role in designing, building, and maintaining systems, software, and applications across various domains, ensuring the highest quality … RSVP. Comprehensive knowledge of multicast concepts, including PIM-SM, PIM-SSM, IGMP, MVPN, and MLDP. Proficiency in product-aligned, service-focused work, with a solid grasp of network automation, SRE, and DevOps or equivalent experience. Working with Agile methodologies (Scrum, Kanban) and project management tools like JIRA. Excellent skills in network packet analysis and packet capture tools for troubleshooting. Familiarity More ❯
A leading financial institution seeks a Principal Network Engineer Position Summary As a highly skilled principal network engineer, you will be responsible for the ongoing support and reliability of all components within the company's ecosystem, encompassing platforms, networks, applications, and services. In this role, you will provide escalation support for important network services, ensuring the operational stability and … performance of the company's global network. You will also find opportunities and improvements, then define requirements for software automation to enhance the features, functions and reliability of the network, underpinning company's digital platforms. Role Responsibilities We will play a pivotal role in designing, building, and maintaining systems, software, and applications across various domains, ensuring the highest quality … RSVP. Comprehensive knowledge of multicast concepts, including PIM-SM, PIM-SSM, IGMP, MVPN, and MLDP. Proficiency in product-aligned, service-focused work, with a solid grasp of network automation, SRE, and DevOps or equivalent experience. Working with Agile methodologies (Scrum, Kanban) and project management tools like JIRA. Excellent skills in network packet analysis and packet capture tools for troubleshooting. Familiarity More ❯
Press Tab to Move to Skip to Content Link BAND: E DEPARTMENT: Product Engineering & Data, Enablement Dev Operations SALARY: £95,000-£102,000 depending on relevant skills, knowledge and experience. The expected salary range for this role reflects internal benchmarking and external market insights. CONTRACT TYPE: Permanent, Full-Time LOCATION: Newcastle or Salford - Hybrid We're happy to discuss … of training materials, documentation, and workshops to upskill engineering teams and foster a culture of operational excellence. Cross-Functional Collaboration - work closely with engineering, product, DevOps/SRE, and service management teams to ensure operational standards are understood, adopted, and measured effectively across the organisation. YOUR SKILLS AND EXPERIENCE ESSENTIAL Significant experience in leading operational excellence initiatives within … large-scale software engineering and/or SRE functions, ideally in digital media, broadcasting, or similar high-availability environments. Deep expertise in software engineering best practices, reliabilityengineering (SRE), and the implementation of operational standards for distributed, cloud-native, or broadcast-critical systems. Proven ability to deliver leadership consultancy, drive strategy definition, and sponsor business cases More ❯
team of both data-led and product focused Software and Production Engineers, pushing the boundaries of technology and working at an extraordinary scale. As a collective we strive for engineering greatness and by ensuring best practices across the board of the community. What will I be doing? You will be joining Flex's platform team, responsible for the core … tools the rest of our team leverages, in a nutshell, you will be: Ensuring our platform is reliable and secure and ready to scale for the future Advocate for sitereliability practices in our product teams Developing best practices, tools, and libraries to reduce toils and increase alignment in product teams Enable product teams to meet their product … targets Is this the job for me? Ideally you'll have/be A background in software engineering; we don't mind what language but we use a lot of Go. Experience or strong interest in SiteReliabilityEngineering Experience with cloud computing (AWS/GCP) Cloud networking & Security knowledge Interest in distributed systems and event More ❯
Hampshire, England, United Kingdom Hybrid / WFH Options
Addition+
of the development teams Configure Kubernetes environments to enable scalable, resilient, and high-performance application deployments Main Skills/Requirements: Over 5 years of hands-on experience in Platform Engineering, SiteReliabilityEngineering, Platform Integration, or related technical domains Proven track record in architecting and implementing CI/CD pipelines using tools like Jenkins, GitLab CI More ❯
Portsmouth, England, United Kingdom Hybrid / WFH Options
Addition+
of the development teams Configure Kubernetes environments to enable scalable, resilient, and high-performance application deployments Main Skills/Requirements: Over 5 years of hands-on experience in Platform Engineering, SiteReliabilityEngineering, Platform Integration, or related technical domains Proven track record in architecting and implementing CI/CD pipelines using tools like Jenkins, GitLab CI More ❯
Development Lifecycle. Proven experience in Product Discovery, data analysis, and delivery methods. Other highly valued skills may include: Strong understanding of modern infrastructure architecture (containerization, virtualization, public cloud) and SiteReliabilityEngineering practices, including metrics and observability tools. Experience working in a finance, banking, or fintech company with an internal customer base. Certified Product Owner You may More ❯
Development Lifecycle. Proven experience in Product Discovery, data analysis, and delivery methods. Other highly valued skills may include: Strong understanding of modern infrastructure architecture (containerization, virtualization, public cloud) and SiteReliabilityEngineering practices, including metrics and observability tools. Experience working in a finance, banking, or fintech company with an internal customer base. Certified Product Owner You may More ❯
and delivery methods./li/ul p Other highly valued skills may include:/p ul li Strong understanding of modern infrastructure architecture (containerization, virtualization, public cloud) and SiteReliabilityEngineering practices, including metrics and observability tools./li li Experience working in a finance, banking, or fintech company with an internal customer base./li More ❯
among your peers in Identity and Access Management, and have a passion for hacking and building. You will have interest and/or experience in related areas like software engineering, security, platform architecture, UI/UX design etc. Roles that you might currently be in: IAM Engineer, Tech Lead, Lead/Senior Support Engineer, SiteReliability Engineer … with Chocolatey and PowerShell for Windows software deployment A foundational understanding of programming and software engineering. Experience integrating different identity providers Nice to have: Familiarity with frontend and backend engineering, including languages such as TypeScript and Python, and frameworks such as React, Remix and Django. Familiarity with our security tooling - CrowdStrike EDR, Kolide, osquery, Zscaler Don't think you More ❯
resilience, ensuring secure, scalable, and uninterrupted platform services, while managing SLAs and engaging with cybersecurity teams. Direct incident management processes, from root cause analysis to resolution, coordinating closely with SiteReliabilityEngineering within BA Digital. Oversee and optimise business continuity planning, incident readiness, and operational efficiency across Commercial Platforms Lead high-performing teams through change, providing strategic More ❯
an organisation, to identify and formulate the need for new capabilities and operating model changes Advising on, designing and implementing modern engineering capabilities including Platform Engineering, DevOps, SRE, Automation, Data & AI and Cloud Conducting Agile and modern engineering maturity assessments to identify opportunities to improve quality, consistency and speed to market Developing recommendations and translating them into … working Providing insights to clients on common pitfalls during operating model transformation and devising appropriate risk mitigation strategies Leading and coaching client teams in adopting new operating models and engineering practices Leading and coaching junior team members Multiple Locations Senior Level Full time Discover where this job fits at Accenture Industry understanding. Deep insights. Big ideas. You'll help More ❯
an organisation, to identify and formulate the need for new capabilities and operating model changes Advising on, designing and implementing modern engineering capabilities including Platform Engineering, DevOps, SRE, Automation, Data & AI and Cloud Conducting Agile and modern engineering maturity assessments to identify opportunities to improve quality, consistency and speed to market Developing recommendations and translating them into … working Providing insights to clients on common pitfalls during operating model transformation and devising appropriate risk mitigation strategies Leading and coaching client teams in adopting new operating models and engineering practices Leading and coaching junior team members Multiple Locations Senior Level Full time Discover where this job fits at Accenture Industry understanding. Deep insights. Big ideas. You'll help More ❯
an organisation, to identify and formulate the need for new capabilities and operating model changes Advising on, designing and implementing modern engineering capabilities including Platform Engineering, DevOps, SRE, Automation, Data & AI and Cloud Conducting Agile and modern engineering maturity assessments to identify opportunities to improve quality, consistency and speed to market Developing recommendations and translating them into … working Providing insights to clients on common pitfalls during operating model transformation and devising appropriate risk mitigation strategies Leading and coaching client teams in adopting new operating models and engineering practices Leading and coaching junior team members Multiple Locations Senior Level Full time Discover where this job fits at Accenture Industry understanding. Deep insights. Big ideas. You'll help More ❯
an organization, to identify and formulate the need for new capabilities and operating model changes. Advising on, designing, and implementing modern engineering capabilities including Platform Engineering, DevOps, SRE, Automation, Data & AI, and Cloud. Conducting Agile and modern engineering maturity assessments to identify opportunities to improve quality, consistency, and speed to market. Developing recommendations and translating them into … working. Providing insights to clients on common pitfalls during operating model transformation and devising appropriate risk mitigation strategies. Leading and coaching client teams in adopting new operating models and engineering practices. Job Qualifications We are looking for individuals with the following skills and experience: Technology Experience: Experience designing modern technology operating models or an engineering professional with a … background in Data & AI, DevOps, or Platform Engineering seeking to pivot towards advisory work. Understanding modern IT, Data, AI, and Cloud operating model components and methodologies. Understanding functional architecture and how this informs modern IT operating model structure and design. Curiosity and passion for exploring emerging technologies (Cloud Native, Data, AI, Robotics, etc.). Strategic Thinking: Ability to synthesize More ❯