build tooling to automate detection and self-healing. Key Responsibilities Incident Response & Troubleshooting Act as primary on-call for React application incidents: crashes, memory leaks, performance regressions, or deployment failures. Analyze browser logs, application metrics (e.g., Real User Monitoring), and backend traces to isolate root causes across React , Node.js services, AWS , and Kubernetes layers. Orchestrate post-incident reviews More ❯
issues, roll backs are slow. One way to address this would be to invest in CI/CD performance improvements, but we'd also like to explore alternative deployment strategies like Canaries, Blue/Green, and traffic mirroring, and get more comfortable testing changes in production with real customer traffic. What you can expect More ❯
London, England, United Kingdom Hybrid / WFH Options
BBC
Job Description This job is with BBC, an inclusive employer and a member of myGwork – the largest global platform for the LGBTQ+ business community. Please do not contact the recruiter directly. Job Details • Job Title: Senior MLOps/GenAI Infrastructure More ❯
Press Tab to Move to Skip to Content Link • Job Title: Senior MLOps/GenAI Infrastructure Engineer • Location: London/Salford/Glasgow/Newcastle/Cardiff (This is a hybrid role and the successful candidate will balance office working More ❯
Press Tab to Move to Skip to Content Link • Job Title: Senior MLOps/GenAI Infrastructure Engineer • Location: London/Salford/Glasgow/Newcastle/Cardiff (This is a hybrid role and the successful candidate will balance office working More ❯
projects with a substantial token market cap a plus Experience utilising all the latest CI/CD, APIs, Security, Collaborative IDEs on public cloud platforms Experience in containers, deployment pipelines, A/B testing, Blue/Green principles to prove what you've built A passion for automation and defining everything via CaC to More ❯
projects with a substantial token market cap a plus Experience utilising all the latest CI/CD, APIs, Security, Collaborative IDEs on public cloud platforms Experience in containers, deployment pipelines, A/B testing, Blue/Green principles to prove what you've built A passion for automation and defining everything via CaC to More ❯
Manchester, Lancashire, United Kingdom Hybrid / WFH Options
Embarcaderomediagroup
Senior Site Reliability & Platform Engineer Manchester Hybrid/Flexible Working Full-Time Drive better infrastructure and developer experience at scale At Sorted, we're building robust, scalable systems to support modern digital services - and we're looking for a Site More ❯
About our Company: LemFi (YC S21, Series B) is revolutionizing cross-border financial services for immigrants through its multi-currency platform, processing over $1 billion in monthly transactions. We provide instant remittances, foreign exchange services, and multi-currency accounts, all More ❯
translation, content reviews, and localization efforts to improve product accessibility for the Chinese-speaking diaspora. Operations & Reliability Rollouts new features gradually with techniques such as feature flags, blue-green deployments, and traffic ramp-ups. Build and monitor SLAs and SLOs with automated product metrics and alerting. Participate in the on-call rotation as part of our … you build it, you run it" ownership model. Ownership & Collaboration Take end-to-end ownership of features from concept to deployment and post-release support. Collaborate closely with cross-functional teams including Engineering, QA, and Product. Ensure your work aligns with international financial regulations and customer needs. Mentor junior developers and contribute to a team culture of continuous More ❯