SRE – Data Platforms

  • Own production on-call responsibilities including incident response, mitigation, and post-mortem analysis.
  • Troubleshoot complex system failures across distributed Linux/Unix environments.
  • Design, deploy, and operate containerized applications in production infrastructure.
  • Build and maintain highly available, scalable distributed services.
  • Write, test, and release production-quality code in Python, Go, or similar languages.
  • Improve observability using monitoring, logging, and alerting practices.
  • Automate operational workflows to reduce manual intervention and MTTR.
  • Collaborate with engineering teams to improve reliability, performance, and release readiness.
  • Perform capacity planning, performance tuning, and resilience testing.
  • Drive continuous improvements in reliability, operational excellence, and system stability.

Job Details

Company
VeeAR Projects Inc
Location
London Area, United Kingdom
Posted