AI Infrastructure Architect
Responsibilities: Design a unified AI Infra & Serving architecture platform for composite AI workloads such as LLM Training & Inference, RLHF, Agent, and Multimodal processing.
All potential candidates should read through the following details of this job with care before making an application.
This platform will integrate inference, orchestration, and state management, defining the technical evolution path for Serverless AI Agentic Serving Design a heterogeneous execution framework across CPU/GPU/NPU for agent memory, tool invocation, and long-running multi-turn conversations and tasks.
Build an efficient memory/KV-cache/vector store/logging and state-management subsystem to support agent retrieval, planning, and persistent memory.
Build a high-performance Runtime/Framework that defines the next-generation Serverless AI foundation through elastic scaling, cold start optimization, batch processing, function-based inference, request orchestration, dynamic decoupled deployment, and other features to support performance scenarios such as multiple models, multi-tenancy, and high concurrency.
All potential candidates should read through the following details of this job with care before making an application.
This platform will integrate inference, orchestration, and state management, defining the technical evolution path for Serverless AI Agentic Serving Design a heterogeneous execution framework across CPU/GPU/NPU for agent memory, tool invocation, and long-running multi-turn conversations and tasks.
Build an efficient memory/KV-cache/vector store/logging and state-management subsystem to support agent retrieval, planning, and persistent memory.
Build a high-performance Runtime/Framework that defines the next-generation Serverless AI foundation through elastic scaling, cold start optimization, batch processing, function-based inference, request orchestration, dynamic decoupled deployment, and other features to support performance scenarios such as multiple models, multi-tenancy, and high concurrency.