Site Reliability Engineer (SRE) Lead

Excellent opportunity for SRE Lead to be part of our Cloud Infrastructure & Security services practice. Cognizant Infrastructure Services – Provides IT infrastructure & Cloud services for clients across industry verticals, including both Consulting/Professional and Managed Services, across Enterprise Computing, Cloud services, Security Services, DevOps, Data Centres, End User Computing, Service Desk, Network Services and Environment Management Services.

Key Responsibilities :

Act as a Site reliability engineering Lead , Reliability as a feature and SRE concepts

Own and manage AWS-based IT environments supporting enterprise applications, STAP platform rollouts, and high performance computing (HPC) workloads.

Lead end-to-end application rollout on STAP platforms, ensuring scalability, security, compliance, and performance readiness.

Design, implement, and manage HPC platforms on AWS to support high end genomics, bioinformatics, and data intensive workloads.

Drive AWS migration initiatives, including assessment, planning, execution, and post migration optimization of on premises and legacy systems.

Collaborate with application, security, network, and compliance teams to ensure secure, compliant, and resilient cloud architectures.

Oversee operational stability, capacity planning, performance optimization, cost management, and incident/problem management across AWS platforms.

Define governance models, operational standards, and best practices for AWS cloud usage across multiple workloads and platforms.

Act as a key technical and managerial escalation point for platform issues and critical incidents.

Key Skills and Experience :

SRE with experience spanning development and operations.

Practical understanding of SRE concepts, including: Reliability as a feature Error budgets and risk based decision making, Toil reduction through automation

Good experience with AWS cloud platforms.

Proven experience in AWS General IT Management, including governance, operations, security, cost optimization, and stakeholder management.

Good experience in application rollout and platform management, preferably on STAP or similar enterprise platforms.

Strong experience in designing and managing HPC platforms on AWS supporting high end genomics, bioinformatics, or other data intensive workloads.

Demonstrated experience in executing AWS-based migration programs, including assessment, planning, execution, and optimization of on premises to cloud migrations.

Experience working in regulated and enterprise environments, with exposure to security, compliance, and data protection requirements.

Prior experience leading or mentoring technical teams and coordinating with cross functional stakeholders is highly desirable.

Apply Now

Site Reliability Engineer (SRE) Lead

Job Details