Principal Machine Learning Engineer - Production Systems

Principal Machine Learning Engineer – Production Systems

Overview

SoftInWay UK Ltd. Is seeking a highly experienced  ML Systems Architect to design and implement a scalable, production-grade architecture for our machine learning solver. This role bridges research prototypes and commercial deployment, ensuring reliability, maintainability, and performance in a mixed technology stack.

Responsibilities

  • Architect the ML Solver Platform :
  • Define modular architecture for data preprocessing, model execution, and post-processing.
  • Establish clear API contracts between Python/TensorFlow and C# services.
  • Productionize ML Workflows :
  • Convert research code into robust, testable, and observable services.
  • Implement CI/CD pipelines, automated testing, and reproducibility standards.
  • Integration & Interoperability :
  • Design REST/gRPC endpoints for cross-language communication.
  • Ensure compatibility with C#/.NET services.
  • Performance & Scalability :
  • Optimize GPU/CPU utilization, batching strategies, and memory management.
  • Plan for multi-model and multi-tenant scenarios.
  • MLOps & Lifecycle Management :
  • Implement model versioning, artifact registries, and deployment workflows.
  • Set up monitoring, logging, and alerting for solver performance.
  • Security & Compliance :
  • Apply best practices for secrets management, dependency scanning, and secure artifact storage.

Required Skills & Experience

  • ML Frameworks : Expert in TensorFlow (TF2/Keras), experience with ONNX Runtime for inference.
  • Programming : Advanced Python for ML; strong understanding of packaging, type checking, and performance profiling.
  • Architecture : Proven experience designing scalable ML systems for production.
  • APIs : Proficiency in gRPC/Protobuf and REST for cross-language integration.
  • MLOps : CI/CD pipelines, containerization (Docker/Kubernetes), model registries, reproducibility.
  • Performance Optimization : GPU acceleration (CUDA/cuDNN), mixed precision, XLA, profiling.
  • Observability : Metrics, tracing, structured logging, dashboards.
  • Security : SBOM, image signing, role-based access, vulnerability scanning.

Preferred Qualifications

  • Experience with ONNX Runtime Training, PyTorch, or hybrid ML architectures.
  • Familiarity with distributed training strategies and multi-GPU setups.
  • Knowledge of feature stores and data validation frameworks.
  • Exposure to regulated environments and compliance frameworks.

Tools & Technologies

  • ML : TensorFlow, ONNX Runtime, tf2onnx.
  • APIs : FastAPI, gRPC.
  • DevOps : GitLab CI/GitHub Actions, Docker, Kubernetes.
  • Monitoring : Prometheus, Grafana, OpenTelemetry.
  • Security : HashiCorp Vault, Sigstore.

Why Join Us?

  • Work on cutting-edge ML solutions integrated into commercial engineering software.
  • Define architecture that scales across global deployments.
  • Collaborate with a team of experts in ML, software engineering, and UI development.
  • Competitive salary and benefits.

To apply: Send your resume and a brief cover letter to  HR@softinway.com

Job Details

Company
SoftInWay UK Ltd
Location
England, UK
Posted