AI Engineer [Computer Vision]

About the Role

This role is based at Depixen’s London Office.

Depixen is a London-based technology company building the digital decision infrastructure of the construction industry. As a corporate member of the World Wide Web Consortium — W3C, Depixen develops W3C-compliant Linked Data architectures, domain-specific ontologies, taxonomy models, RDF-based data structures, and knowledge graph infrastructures for the construction sector. Through its AI projects with respected universities and institutions in the United Kingdom, Depixen continues to build a scalable global structure across the United States, Europe, and the Far East.

Computer vision is one of the critical perception layers of this infrastructure. In construction, architecture, and building products, visual data is not merely unstructured; it is deeply contextual. Product images, technical documents, drawings, site photos, spatial data, material surfaces, and building elements must be interpreted together with verified technical knowledge, semantic classifications, and machine-interpretable data models.

This is not a conventional image recognition role. You will help connect visual AI outputs with verified data, taxonomy, ontology, RDF, and knowledge graph layers, turning perception into reliable decision intelligence for the construction industry.

We are seeking a talented Computer Vision Engineer to design, develop, and productionize advanced perception systems for real-world construction industry use cases. In this role, you will work on object detection, segmentation, OCR, visual matching, product recognition, building element analysis, site image interpretation, video analytics, and multimodal vision-language applications.

The ideal candidate is analytical, research-driven, collaborative, and capable of turning advanced computer vision ideas into reliable production systems that address real industry problems.

Responsibilities

Design, develop, and evaluate robust, scalable, and verifiable computer vision models and pipelines using modern deep learning frameworks.
Build and optimize end-to-end vision systems covering data preprocessing, model development, deployment, monitoring, and continuous improvement.
Develop systems for object detection, segmentation, tracking, OCR, image classification, visual matching, product recognition, and video analytics as required.
Collaborate with data and modelling teams to connect visual AI outputs with taxonomy, ontology, RDF, and knowledge graph layers.
Work with cross-functional teams to understand product requirements and translate them into scalable technical solutions.
Develop automated testing, benchmarking, and evaluation workflows to ensure the performance, reliability, and safety of computer vision applications.
Optimize models and inference pipelines for scalability, latency, and cost-efficiency across GPU and CPU environments.
Contribute to dataset design, annotation processes, data quality validation, and tools that improve model performance and reliability.
Translate research-level computer vision and multimodal AI approaches into production-ready technical solutions.

Required Qualifications

Bachelor’s degree in Computer Science, Electrical Engineering, Computer Engineering, Artificial Intelligence Engineering, or a related field.
3-6 years of experience in computer vision, deep learning, or a related AI field.
Strong proficiency in Python.
Hands-on experience with deep learning frameworks such as PyTorch, TensorFlow, or similar.
Practical experience developing, testing, and deploying computer vision models in production environments.
Solid technical understanding of convolutional and transformer-based architectures, such as CNNs, ViT, YOLO, and Detectron2.
Hands-on experience in several of the following areas: object detection, segmentation, OCR, tracking, image classification, or video analytics.
Experience with ML Ops practices and tools such as Docker, Kubernetes, MLflow, and Weights & Biases.
Systematic approach to model evaluation, benchmarking, data quality control, and error analysis.
Familiarity with GPU/CPU inference optimization, latency management, and model deployment workflows.
Ability to analyse technical problems clearly, document solutions effectively, and communicate across teams.

Preferred Qualifications

Master’s or PhD degree in a relevant field.
Experience implementing or fine-tuning vision-language models such as CLIP, BLIP, or SAM.
Experience with multimodal AI, visual grounding, open-vocabulary detection, or image-text retrieval.
Familiarity with edge deployment frameworks such as TensorRT, OpenVINO, or ONNX Runtime.
Experience with 3D vision, point clouds, depth estimation, SLAM, spatial intelligence, or digital twin-based visual analysis.
Experience working with construction, architecture, construction technologies, BIM, technical document analysis, or product data systems.
Experience with OCR, technical document processing, drawing analysis, catalogue data extraction, or visual product matching.
Contributions to open-source computer vision projects.
Experience deploying models on cloud platforms such as AWS, GCP, or Azure.
Experience with data annotation, synthetic data, active learning, or dataset quality management.

Problem Areas You May Work On

In this role, you may work on problem areas including:

Visual recognition and classification of building products.
Matching product images with technical data, catalogue information, and semantic classifications.
OCR-based data extraction from technical documents, catalogues, and PDFs.
Joint analysis of architectural drawings, site photos, and product images.
Detection of building elements and material surfaces.
Linking visual data with taxonomy, ontology, and knowledge graph structures.
Applying vision-language models in the construction industry context.
Quality, compliance, or contextual analysis from site imagery.
Connecting visual AI outputs to verified, structured, and machine-interpretable data infrastructure.

Why This Role Different

The construction industry is one of the most complex application areas for computer vision because it brings together architecture, engineering, construction, building materials, and site operations. In this domain, the meaning of an image cannot be derived from pixels alone. The product class, technical standard, usage context, relationship to building elements, material characteristics, performance values, and verifiable data counterpart must be considered together.

At Depixen, computer vision outputs are not treated as isolated predictions. They are treated as decision components connected to verified data, semantic classification, ontology, RDF, and knowledge graph layers. This makes the role not only about model development, but about building reliable, contextual, and verifiable AI systems for the construction industry.

Apply Now

AI Engineer [Computer Vision]

Job Details