DevOps Specialist

Apply Now

Job Title: DevOps Specialist & Data Engineer

Location: Remote

Type: Full-time

Experience Level: Senior

Industry: Generative AI / Artificial Intelligence / Machine Learning

Reports To: Head of Engineering / CTO

About Us

Ready to join a cutting edge AI company? We’re on a mission to become the OpenAI of the spicy content industry, building a full-spectrum ecosystem of revolutionary AI infrastructure and products. Our platform, OhChat, features digital twins of real-world personalities and original AI characters, enabling users to interact with lifelike AI-generated characters through text, voice, and images, with a roadmap that includes agentic superModels, API integrations, and video capabilities.

Role Overview

We are looking for a Senior DevOps Specialist with a strong python and data engineering background to support our R&D and tech teams by designing, building, and maintaining robust infrastructure and data pipelines across AWS and GCP. You will be instrumental in ensuring our systems are scalable, observable, cost-effective, and secure. This role is hands-on, cross-functional, and central to our product and research success.

Key ResponsibilitiesDevOps & Infrastructure

Design, implement, and maintain infrastructure on AWS and Google Cloud Platform (GCP) to support high-performance computing workloads and scalable services.

Collaborate with R&D teams to provision and manage compute environments for model training and experimentation.

Maintain / monitor systems, implement observability solutions (e.g., logging, metrics, tracing), and proactively resolve infrastructure issues.

Manage CI/CD pipelines for rapid, reliable deployment of services and models.

Ensure high availability, disaster recovery, and robust security practices across environments.

Data Engineering

Build and maintain data processing pipelines for model training, experimentation, and analytics.

Work closely with machine learning engineers and researchers to understand data requirements and workflows.

Design and implement solutions for data ingestion, transformation, and storage using tools such as Scrappy, Playwright, agentic workflows (e.g. crawl4ai) or equivalent.

Optimize and benchmark AI training / inference / data workflows to ensure high performance, scalability, cost and an exceptional customer experience.

Maintain data quality, lineage, and compliance across multiple environments.

Key Requirements

5+ years of experience in DevOps, Site Reliability Engineering, or Data Engineering roles.

Deep expertise with AWS and GCP, including services like EC2, S3, Lambda, IAM, GKE, BigQuery, and more.

Strong proficiency in infrastructure-as-code tools (e.g., Terraform, Pulumi, CloudFormation).

Extensive hands-on experience with Docker, Kubernetes, and CI/CD tools such as GitHub Actions, Bitbucket Pipelines, or Jenkins, with a strong ability to optimize CI/CD workflows as well as AI training and inference pipelines for performance and reliability."

Exceptional programming skills in Python. You are expected to write clean, efficient, and production-ready code. You should be highly proficient with modern Python programming paradigms and tooling.

Proficiency in data-centric programming and scripting languages (e.g., Python, SQL, Bash).

Proven experience designing and maintaining scalable ETL/ELT pipelines.

Focused, sharp, and results-oriented: You are decisive, work with a high degree of autonomy, and consistently deliver high-quality results. You are quick to understand and solve the core of a problem and know how to summarize it efficiently for stakeholders.

Effective communicator and concise in reporting: You should be able to communicate technical insights in a clear and actionable manner, both verbally and in written form. Your reports should be precise, insightful, and aligned with business objectives.

Nice to Have

Experience supporting AI/ML model training infrastructure (e.g., GPU orchestration, model serving) for both Diffusion- and LLM pipelines.

Familiarity with data lake architectures and tools like Delta Lake, LakeFS, or Databricks.

Knowledge of security and compliance best practices (e.g., SOC2, ISO 27001).

Exposure to MLOps platforms or frameworks (e.g., MLflow, Kubeflow, Vertex AI).

What We Offer

Competitive salary + equity

Flexible work environment and remote-friendly culture

Opportunities to work on cutting-edge AI/ML technology

Fast-paced environment with high impact and visibility

Professional growth support and resources

Company: OhChat
Location: Bradford, UK
Employment Type: Full-time
Posted: 10 hours ago

Apply Now

Company: OhChat
Location: Bradford, UK
Employment Type: Full-time
Posted: 10 hours ago