Founding AI Engineer
Want to put your job search on autopilot? Join our platform, complete a 6-minute AI screening interview, and get auto-applied to 100s of high-paying roles.
Sign up now at https://app.calyptus.co/auth/candidate/sign-up and let the opportunities come to you.
____________________________________________________________
What You Will Do
- Build production-grade LLM and agentic workflows for business decision-making.
- Turn customer, market, and operational data into reasoning contexts that models can use reliably.
- Design evaluation, tracing, replay, and quality-control systems for nondeterministic AI outputs.
- Work on counterfactual analysis, forecasting, customer behaviour modelling, and decision support.
- Improve reliability through structured outputs, retrieval, constraints, calibration, tests, and model comparisons.
- Build internal tools that make AI workflows repeatable across customers and use cases.
- Help define how our measures whether a prediction was useful, accurate, and trustworthy.
- Work directly with the founders on product direction, customer problems, and technical architecture.
What Makes This Technically Hard
The challenge is not simply getting an LLM to produce a plausible answer. The challenge is building systems that can work with incomplete business data, expose their assumptions, separate evidence from speculation, reason under uncertainty, and improve when reality proves them wrong.
We care about systems that can:
- Say what they know.
- Say what they do not know.
- Explain why they reached a conclusion.
- Explain how that conclusion should be tested.
What We Are Looking For
You may be a strong fit if you have:
- 5+ years of engineering experience.
- Strong Python skills and comfort building production systems.
- A solid ML, data science, or applied AI background.
- Recent hands-on experience with LLMs, agents, RAG, tool use, or AI workflow systems.
- Experience designing evaluations, tests, metrics, or quality gates for AI/ML systems.
- Good judgment around hallucination, uncertainty, model failure modes, and measurement.
- Comfort working with messy data, SQL, APIs, and data pipelines.
- The ability to debug both code and reasoning.
- Strong product instincts: you care whether the output actually helps a user make a decision.
- High agency in ambiguous environments.
- Clear communication and intellectual honesty.
We would be particularly interested if you have worked on any of the following:
- LLM applications or agentic systems in production.
- Evaluation systems for LLMs or ML models.
- Applied ML systems for forecasting, experimentation, analytics, recommendations, or decision intelligence.
- RAG, embeddings, vector search, structured extraction, or model orchestration.
- Statistical analysis, A/B testing, causal inference, or calibration.
- Data products where traceability, auditability, or reliability mattered.
- Early-stage startups where you had to own both technical direction and implementation.
____________________________________________________________
Disclaimer: Calyptus uses an automated assessment tool that scores applicants.
Want to put your job search on autopilot? Join our platform, complete a 6-minute AI screening interview, and get auto-applied to 100s of high-paying roles.
Sign up now at https://app.calyptus.co/auth/candidate/sign-up and let the opportunities come to you.