DevOps Specialist

Job Title: DevOps Specialist & Data Engineer

Location: Remote

Type: Full-time

Experience Level: Senior

Industry: Generative AI / Artificial Intelligence / Machine Learning

Reports To: Head of Engineering / CTO

About Us

Ready to join a cutting edge AI company? We’re on a mission to become the OpenAI of the spicy content industry, building a full-spectrum ecosystem of revolutionary AI infrastructure and products. Our platform, OhChat, features digital twins of real-world personalities and original AI characters, enabling users to interact with lifelike AI-generated characters through text, voice, and images, with a roadmap that includes agentic superModels, API integrations, and video capabilities.

Role Overview

We are looking for a Senior DevOps Specialist with a strong python and data engineering background to support our R&D and tech teams by designing, building, and maintaining robust infrastructure and data pipelines across AWS and GCP. You will be instrumental in ensuring our systems are scalable, observable, cost-effective, and secure. This role is hands-on, cross-functional, and central to our product and research success.

Key ResponsibilitiesDevOps & Infrastructure
  • Design, implement, and maintain infrastructure on AWS and Google Cloud Platform (GCP) to support high-performance computing workloads and scalable services.
  • Collaborate with R&D teams to provision and manage compute environments for model training and experimentation.
  • Maintain / monitor systems, implement observability solutions (e.g., logging, metrics, tracing), and proactively resolve infrastructure issues.
  • Manage CI/CD pipelines for rapid, reliable deployment of services and models.
  • Ensure high availability, disaster recovery, and robust security practices across environments.
Data Engineering
  • Build and maintain data processing pipelines for model training, experimentation, and analytics.
  • Work closely with machine learning engineers and researchers to understand data requirements and workflows.
  • Design and implement solutions for data ingestion, transformation, and storage using tools such as Scrappy, Playwright, agentic workflows (e.g. crawl4ai) or equivalent.
  • Optimize and benchmark AI training / inference / data workflows to ensure high performance, scalability, cost and an exceptional customer experience.
  • Maintain data quality, lineage, and compliance across multiple environments.
Key Requirements
  • 5+ years of experience in DevOps, Site Reliability Engineering, or Data Engineering roles.
  • Deep expertise with AWS and GCP, including services like EC2, S3, Lambda, IAM, GKE, BigQuery, and more.
  • Strong proficiency in infrastructure-as-code tools (e.g., Terraform, Pulumi, CloudFormation).
  • Extensive hands-on experience with Docker, Kubernetes, and CI/CD tools such as GitHub Actions, Bitbucket Pipelines, or Jenkins, with a strong ability to optimize CI/CD workflows as well as AI training and inference pipelines for performance and reliability."
  • Exceptional programming skills in Python. You are expected to write clean, efficient, and production-ready code. You should be highly proficient with modern Python programming paradigms and tooling.
  • Proficiency in data-centric programming and scripting languages (e.g., Python, SQL, Bash).
  • Proven experience designing and maintaining scalable ETL/ELT pipelines.
  • Focused, sharp, and results-oriented: You are decisive, work with a high degree of autonomy, and consistently deliver high-quality results. You are quick to understand and solve the core of a problem and know how to summarize it efficiently for stakeholders.
  • Effective communicator and concise in reporting: You should be able to communicate technical insights in a clear and actionable manner, both verbally and in written form. Your reports should be precise, insightful, and aligned with business objectives.
Nice to Have
  • Experience supporting AI/ML model training infrastructure (e.g., GPU orchestration, model serving) for both Diffusion- and LLM pipelines.
  • Familiarity with data lake architectures and tools like Delta Lake, LakeFS, or Databricks.
  • Knowledge of security and compliance best practices (e.g., SOC2, ISO 27001).
  • Exposure to MLOps platforms or frameworks (e.g., MLflow, Kubeflow, Vertex AI).
What We Offer
  • Competitive salary + equity
  • Flexible work environment and remote-friendly culture
  • Opportunities to work on cutting-edge AI/ML technology
  • Fast-paced environment with high impact and visibility
  • Professional growth support and resources
Company
OhChat
Location
Luton, Bedfordshire, UK
Employment Type
Full-time
Posted
Company
OhChat
Location
Luton, Bedfordshire, UK
Employment Type
Full-time
Posted