assets from legacy systems to our cloud-native environments using AWS and our bespoke Conversion Framework. Build new and maintain existing bespoke systems. Implement .NET-based microservices with strong observability and integration with data platforms. Develop custom ETL pipelines using AWS, Python, and MySQL. Implement governance, lineage, and monitoring to ensure high availability and traceability. AI & Advanced Analytics Integration: Collaborate More ❯
love solving big problems and are motivated to get things done. You're not afraid to show off your work, as well as learn new things. Any experience with Observability platforms such as Grafana would be advantageous. Knowledge of/willingness to pick up Rust and ideally experience with writing developer focused tooling and software previously. Technologies you will work More ❯
assets from legacy systems to our cloud-native environments using AWS and our bespoke Conversion Framework. Build new and maintain existing bespoke systems. Implement .NET-based microservices with strong observability and integration with data platforms. Develop custom ETL pipelines using AWS, Python, and MySQL. Implement governance, lineage, and monitoring to ensure high availability and traceability. AI & Advanced Analytics Integration: Collaborate More ❯
practices (Agile, Scrum, Kanban) Proficiency in CI/CD pipelines, infrastructure as code, and cloud data tooling Familiarity with data governance, privacy, and security principles Experience using metrics and observability tools to monitor data platform health and team performance Experience in performance management and setting measurable goals for team members This role isn't for you if. You rely on More ❯
practices (Agile, Scrum, Kanban) Proficiency in CI/CD pipelines, infrastructure as code, and cloud data tooling Familiarity with data governance, privacy, and security principles Experience using metrics and observability tools to monitor data platform health and team performance Experience in performance management and setting measurable goals for team members This role isn't for you if. You rely on More ❯
practices (Agile, Scrum, Kanban) Proficiency in CI/CD pipelines, infrastructure as code, and cloud data tooling Familiarity with data governance, privacy, and security principles Experience using metrics and observability tools to monitor data platform health and team performance Experience in performance management and setting measurable goals for team members This role isn't for you if. You rely on More ❯
and driving down costs. Application development. If you're currently a application engineer working in Python or NodeJS with a strong operational slant, that can work well for us. Observability (Datadog), with a strong focus on enabling and empowering Engineering teams to understand their product in Production. SAAS Networking. Geolocation based performance, the path to multi-region, frontend performance optimisation. More ❯
domain. Experience in a strongly/statically typed language. Have a strong understanding of designing, building, and running high-quality, standards-compliant workflow APIs, with a focus on testing, observability, and performance. Have worked with a cloud provider (AWS/Azure/GCP). Have worked with distributed systems and are comfortable debugging through tracing and observability. Willing to be More ❯
domain. Experience in a strongly/statically typed language. Have a strong understanding of designing, building, and running high-quality, standards-compliant workflow APIs, with a focus on testing, observability, and performance. Have worked with a cloud provider (AWS/Azure/GCP). Have worked with distributed systems and are comfortable debugging through tracing and observability. Willing to be More ❯
domain. Experience in a strongly/statically typed language. Have a strong understanding of designing, building, and running high-quality, standards-compliant workflow APIs, with a focus on testing, observability, and performance. Have worked with a cloud provider (AWS/Azure/GCP). Have worked with distributed systems and are comfortable debugging through tracing and observability. Willing to be More ❯
Royal Leamington Spa, England, United Kingdom Hybrid / WFH Options
Tata Consultancy Services
If you need support in completing the application or if you require a different format of this document, please get in touch with at UKI.recruitment@tcs.com or call TCS London Office number 02031552100/+44 204 520 2575 with the More ❯
City of London, London, United Kingdom Hybrid / WFH Options
Cpl
Site Reliability Engineer (SRE) Lead – Observability Rate: £450-£475 per day (Inside IR35) Location: London (Hybrid, 2 days on site per week) Contract Role Overview: Join a high-impact team where you'll lead and shape the SRE and Observability function for a major transformation programme. This role goes beyond traditional SRE – you’ll champion best practices across product teams … drive observability strategy, and work hands-on with cutting-edge tools like Datadog and AWS. Key Responsibilities: Lead the SRE function and promote observability-first thinking across development and operations teams. Define and implement the observability roadmap across product domains in collaboration with the client. Be hands-on with Datadog for infrastructure and application-level monitoring. Guide and review daily … operations and improvements across observability platforms. Partner with engineering squads to deliver on observability requirements in an agile, demand-led way. Core Skills & Experience: Proven experience as a hands-on SRE Engineer. Deep understanding of observability and monitoring practices. Practical experience with Datadog (or similar observability platforms). Strong DevOps toolchain knowledge: GitHub, GitHub Actions, Jenkins, CodeQL, Nexus, CloudFormation, Terraform. More ❯
Site Reliability Engineer (SRE) Lead – Observability Rate: £450-£475 per day (Inside IR35) Location: London (Hybrid, 2 days on site per week) Contract Role Overview: Join a high-impact team where you'll lead and shape the SRE and Observability function for a major transformation programme. This role goes beyond traditional SRE – you’ll champion best practices across product teams … drive observability strategy, and work hands-on with cutting-edge tools like Datadog and AWS. Key Responsibilities: Lead the SRE function and promote observability-first thinking across development and operations teams. Define and implement the observability roadmap across product domains in collaboration with the client. Be hands-on with Datadog for infrastructure and application-level monitoring. Guide and review daily … operations and improvements across observability platforms. Partner with engineering squads to deliver on observability requirements in an agile, demand-led way. Core Skills & Experience: Proven experience as a hands-on SRE Engineer. Deep understanding of observability and monitoring practices. Practical experience with Datadog (or similar observability platforms). Strong DevOps toolchain knowledge: GitHub, GitHub Actions, Jenkins, CodeQL, Nexus, CloudFormation, Terraform. More ❯
robust container orchestration platform to support large-scale, compute-intensive workloads in a high-performance trading environment. You'll lead a team of engineers, define best practices, and ensure observability, scalability, and performance across the platform. Key Responsibilities Lead the design and operation of Kubernetes platforms (on-prem & cloud-native) Manage HPC infrastructure to support trading workloads and scientific compute … Guide a team of engineers across distributed environments Define and enforce best practices for infrastructure scalability, performance, and monitoring Implement observability tooling and ensure high platform availability Collaborate with other engineering teams to drive automation and operational efficiency Requirements 8+ years in infrastructure/platform engineering roles Deep expertise in Kubernetes (both on-premises and cloud-native) Strong Linux (preferably … RHEL) systems administration skills Proven experience with HPC workloads or scientific computing clusters Hands-on experience with observability tools: Prometheus, Grafana, Loki Infrastructure as Code (IaC) using Terraform, Ansible CI/CD experience with GitOps tools (e.g., ArgoCD, Flux) Prior experience leading engineering teams in distributed environments More ❯
robust container orchestration platform to support large-scale, compute-intensive workloads in a high-performance trading environment. You'll lead a team of engineers, define best practices, and ensure observability, scalability, and performance across the platform. Key Responsibilities Lead the design and operation of Kubernetes platforms (on-prem & cloud-native) Manage HPC infrastructure to support trading workloads and scientific compute … Guide a team of engineers across distributed environments Define and enforce best practices for infrastructure scalability, performance, and monitoring Implement observability tooling and ensure high platform availability Collaborate with other engineering teams to drive automation and operational efficiency Requirements 8+ years in infrastructure/platform engineering roles Deep expertise in Kubernetes (both on-premises and cloud-native) Strong Linux (preferably … RHEL) systems administration skills Proven experience with HPC workloads or scientific computing clusters Hands-on experience with observability tools : Prometheus, Grafana, Loki Infrastructure as Code (IaC) using Terraform , Ansible CI/CD experience with GitOps tools (e.g., ArgoCD, Flux) Prior experience leading engineering teams in distributed environments More ❯
robust container orchestration platform to support large-scale, compute-intensive workloads in a high-performance trading environment. You'll lead a team of engineers, define best practices, and ensure observability, scalability, and performance across the platform. Key Responsibilities Lead the design and operation of Kubernetes platforms (on-prem & cloud-native) Manage HPC infrastructure to support trading workloads and scientific compute … Guide a team of engineers across distributed environments Define and enforce best practices for infrastructure scalability, performance, and monitoring Implement observability tooling and ensure high platform availability Collaborate with other engineering teams to drive automation and operational efficiency Requirements 8+ years in infrastructure/platform engineering roles Deep expertise in Kubernetes (both on-premises and cloud-native) Strong Linux (preferably … RHEL) systems administration skills Proven experience with HPC workloads or scientific computing clusters Hands-on experience with observability tools : Prometheus, Grafana, Loki Infrastructure as Code (IaC) using Terraform , Ansible CI/CD experience with GitOps tools (e.g., ArgoCD, Flux) Prior experience leading engineering teams in distributed environments More ❯
teams to senior executives. Design and manage Proof of Concepts (POCs) and Proof of Value (POVs). Act as a technical advisor, helping customers select and implement the right observability and monitoring solutions. Communicate customer needs and product feedback to internal product and engineering teams. Create custom solutions to bridge gaps and maximize value for each client. Key skills for … Strong coding ability in at least one high-level programming language (e.g. Java, Go, Python). Deep technical knowledge of Kubernetes, AWS, Azure, GCP or Docker Solid understanding of observability tools, log management, APM, and SIEM. Experience in DevOps or engineering roles is a strong advantage. Background in technical sales or customer engagement within observability or security platforms is a More ❯
teams to senior executives. Design and manage Proof of Concepts (POCs) and Proof of Value (POVs). Act as a technical advisor, helping customers select and implement the right observability and monitoring solutions. Communicate customer needs and product feedback to internal product and engineering teams. Create custom solutions to bridge gaps and maximize value for each client. Key skills for … Strong coding ability in at least one high-level programming language (e.g. Java, Go, Python). Deep technical knowledge of Kubernetes, AWS, Azure, GCP or Docker Solid understanding of observability tools, log management, APM, and SIEM. Experience in DevOps or engineering roles is a strong advantage. Background in technical sales or customer engagement within observability or security platforms is a More ❯
Administer GitLab infrastructure for CI/CD processes. Operate and maintain Kafka clusters for real-time data pipelines. Diagnose and resolve issues across systems, networks, containers, and applications. Use observability tools (Grafana, Prometheus, Kibana, Elasticsearch) to monitor system health. Automate system management tasks using Ansible. Participate in an on-call rotation to support global operations. Required Skills & Experience: Strong hands … system optimization. Production-level experience managing Kubernetes clusters. Proficiency with GitLab for version control and CI/CD workflows. Solid understanding of Kafka in high-throughput environments. Experience with observability tools such as Grafana, Prometheus, Kibana, and Elasticsearch. Expertise in Ansible for automation and configuration management. Strong problem-solving skills across infrastructure layers (compute, network, OS, containers). More ❯
Administer GitLab infrastructure for CI/CD processes. Operate and maintain Kafka clusters for real-time data pipelines. Diagnose and resolve issues across systems, networks, containers, and applications. Use observability tools (Grafana, Prometheus, Kibana, Elasticsearch) to monitor system health. Automate system management tasks using Ansible. Participate in an on-call rotation to support global operations. Required Skills & Experience: Strong hands … system optimization. Production-level experience managing Kubernetes clusters. Proficiency with GitLab for version control and CI/CD workflows. Solid understanding of Kafka in high-throughput environments. Experience with observability tools such as Grafana, Prometheus, Kibana, and Elasticsearch. Expertise in Ansible for automation and configuration management. Strong problem-solving skills across infrastructure layers (compute, network, OS, containers). More ❯
Administer GitLab infrastructure for CI/CD processes. Operate and maintain Kafka clusters for real-time data pipelines. Diagnose and resolve issues across systems, networks, containers, and applications. Use observability tools (Grafana, Prometheus, Kibana, Elasticsearch) to monitor system health. Automate system management tasks using Ansible. Participate in an on-call rotation to support global operations. Required Skills & Experience: Strong hands … system optimization. Production-level experience managing Kubernetes clusters. Proficiency with GitLab for version control and CI/CD workflows. Solid understanding of Kafka in high-throughput environments. Experience with observability tools such as Grafana, Prometheus, Kibana, and Elasticsearch. Expertise in Ansible for automation and configuration management. Strong problem-solving skills across infrastructure layers (compute, network, OS, containers). More ❯
effective AI solutions. Build scalable AI systems including agentic chatbots and self-service platforms Develop internal AI copilot tools to enhance operational efficiency Ensure reliability through automated testing and observability Proficiency in Python or similar programming languages ML fundamentals and LLM expertise (prompt engineering, cost optimization) Cloud platforms (AWS/GCP/Azure) and infrastructure-as-code DevOps tooling (CI …/CD, Docker, Kubernetes) and observability Distributed systems and datastores (SQL/NoSQL) Opportunity to Work with cutting-edge AI at global scale in a profitable, fast-growing company with clear AI investment strategy. Generous PTO, paid sabbatical after 5 years, remote working holidays, exclusive lifestyle perks, inclusive environment. More ❯
success across all regions. Partner closely with R&D, Customer Success, Product, Sales, and Support to drive holistic customer outcomes. Hands-On Technical Expertise Maintain hands-on fluency in observability tooling, logging infrastructure, and cloud environments. Act as a senior technical escalation point for complex deployments or architectural challenges. Provide in-depth technical guidance on customer environments, use cases, and … performance analytics. Collaborate on the development of tools and dashboards to ensure visibility and impact tracking. Requirements Technical Experience 10+ years of technical experience in Cloud DevOps, SaaS, or observability, with 5+ years in leadership roles. Strong hands-on experience with AWS, GCP, Azure, K8S, Terraform and observability tools: Prometheus, Grafana, OpenTelemetry, ELK, Splunk, Datadog, and similar. Proficiency with metrics … team members are encouraged to challenge the status quo and contribute to our shared mission. If you thrive in dynamic environments and are eager to shape the future of observability solutions, we'd love to hear from you. Coralogix is an equal opportunity employer and encourages applicants from all backgrounds to apply. More ❯
Exciting You’ll work in a Node.js-first environment where product and platform teams collaborate closely. You’ll own core infrastructure and DevOps processes, from CI/CD to observability . You’ll be part of a team that encourages experimentation, autonomy, and continuous improvement . You'll help shape the SRE function at a high-impact stage of growth. … Doing Build and improve CI/CD pipelines (GitHub Actions) that keep development smooth and fast Maintain and scale infrastructure on AWS , including ECS, S3, RDS, and CloudFront Improve observability using tools like Datadog and CloudWatch — and act on what you find Automate key workflows around deployment, testing, scaling, and failure recovery Collaborate with engineers to build scalable, secure, and … For Strong experience working in production Node.js environments Hands-on with AWS services and container orchestration (ECS, Docker) Skilled at building and maintaining CI/CD pipelines Experience with observability, monitoring , and incident management Working knowledge of infrastructure-as-code (Terraform, CloudFormation) A collaborative, proactive mindset with strong communication skills 🎁 What You’ll Get A collaborative, mission-driven culture that More ❯