Camille Bauce - XR & AI Software Engineer

Project Overview

Camera Aware Chatbot was built as a proof-of-concept for AffectiLink in April 2025, showcasing how live camera data can enrich on-device LLM conversations. The system constantly feeds what it “sees” through your webcam into a vision model, then injects that descriptive context into every chat response, giving your local AI genuine visual awareness.

Under the hood, two Flask microservices power the pipeline. The first server reads your camera stream and runs it through the Florence 2 visual model, outputting detailed text descriptions of the scene at regular intervals. The second server acts as an orchestrator: whenever the Unity client sends a message, it pulls in the latest camera description, wraps it into the prompt for a local LLM (Gemma 3), and returns the combined reply, essentially performing a real-time RAG with visual context.

On the front end, a Unity application provides a simple text input and an expressive VRM avatar that reacts to the conversation with multi-step dialogue and emotion cues. Built with C# and UniVRM, the avatar can talk, smile, and express emotions as it answers your questions, all while running entirely offline. This POC laid the groundwork for Mira Desktop AI, proving that a truly private, camera-aware assistant could live on your machine without any cloud dependencies.

Key Features

Camera-aware chatbot that can interact with the user based on their camera feed

100% local Computer Vision and AI processing

Multi step dialogue system with an avatar that can express emotions

Florence 2 model for advanced visual understanding

Gemma 3 model for natural language processing

Flask server acting as an orchestrator ensuring output consistency from the LLM and VLM

Technical Challenges

Keeping response times "low" while having real time camera processing and LLM inference

Working with multiple large models locally with limited resources

Testing and choosing multiple VLMs and LLMs for the best results

Ensuring the output from the LLM compatible with the dialogue system

Technologies Used

Front-end

Unity

VRM

C#

Back-end

Flask

Python

AI

Ollama

PyTorch

Transformers

LLM - Gemma 3

VLM - Florence 2

Tools

Insomnia

Adobe Mixamo

Hugging Face