Site Reliability Engineer

Support & DevOps Engineer

A global fintech are seeking a proactive and technically adept Support & DevOps Engineer to maintain and enhance our production trading environment. This unique hybrid role blends BAU support responsibilities with advanced DevOps engineering tasks, ensuring robust operational efficiency and technical excellence.

The successful candidate will provide real-time operational support, build next-generation reconciliation tools for trade consistency across digital and traditional financial assets, and implement comprehensive monitoring and tooling solutions leveraging modern technologies.

Primary Responsibilities:

  • Provide support for production environment issues, promptly diagnosing and resolving problems, both offline and in real-time.
  • Communicate effectively with internal customers and senior software engineers to resolve and escalate issues clearly and efficiently.
  • Design, develop, and maintain advanced reconciliation applications to ensure consistency across digital and traditional finance trade-capture processes.
  • Develop and enhance monitoring dashboards and alerts using DataDog, Grafana, or similar technologies to proactively identify and address production issues, including end-to-end system latency.
  • Build tooling and monitoring solutions to facilitate comprehensive post-release validation, ensuring software functions correctly following deployments.
  • Participate in release management processes and uphold best practices following Agile methodologies.
  • Be the Incident coordinator for operational incidents on the core engineering production platform. This includes all technical internal communications, ensuring processes are followed and all post-incident followup and analysis.

Qualifications and Skills:

  • 5+ years of combined experience in Support, DevOps, SRE, or related technical roles, ideally within financial technology environments.
  • Strong technical expertise with Python, Unix, PostgreSQL, and familiarity with Kafka, CockroachDB, FastAPI, GraphQL, Snowflake, Redis, and QuestDB or equivalent technologies.
  • Proven experience designing and implementing monitoring and alerting tools (DataDog, Grafana).
  • Solid experience with AWS Cloud Infrastructure and related operational processes.
  • Deep understanding of and experience troubleshooting REST APIs and WebSockets.
  • Exposure to crypto, blockchain (DLT), trading, and risk management systems is highly beneficial.
  • Familiarity with financial services infrastructure standards and processes (e.g., ITIL) within a DevOps or Site Reliability Engineering (SRE) context.
  • Demonstrated experience managing multiple priorities effectively and operating efficiently in ambiguous environments.
  • Excellent documentation and knowledge-sharing skills, coupled with a passion for continuous improvement in documentation strategies and tooling.
  • Experience with incident response protocols and comfort navigating high-pressure situations.
  • Proficiency with development workflows and tools (JIRA, Confluence, GitHub, Scrum methodologies).
  • Strong written and spoken English communication skills; ability to clearly articulate technical concepts to varied audiences.

Job Details

Company
Global Fintech
Location
London, UK
Employment Type
Full-time
Posted