ME.gic.LO.gic

“About”

Human Computer/AI Interaction
“Bridge as an Astryn”

XR Development
“Spatial Intelligence”

Architecture / Installation
“Rationale”

Photography
“Innerpeace”

3. Lucid Prism: A MR LLM-Based Multi-Modal Agent System

24 Winter, ongoing prototype, self-developed.
*Designed before Meta releases Passthrough API and camera access.

0:00 - 0:56 Conversational Assistant; 0:57 - 1:56 Spatial Computing Assistant; 1:57 - 2:49 Object Transformation Assistant.

- Pattie Maes

"The future of human-computer interaction lies in creating systems that can seamlessly understand and respond to the full spectrum of human expression."

Over the past week, I dedicated myself to building a Virtual Assistant Prototype for Quest. This project stemmed from real-life challenges I frequently face—such as the inability to easily access my phone while cooking or when my hands are messy. Developing a virtual assistant within Quest provided an innovative and practical solution.

This prototype is designed to address significant pain points in Quest and LLM (Large Language Model) APIs, including the lack of camera access, limited contextual memory, and challenges with text-based communication. The project features three core modes:

Conversational Assistant
Spatial Computing Assistant
Object Transformation Assistant

Together, these modes reimagine the way users interact with virtual environments, bridging the gap between intention and execution in mixed reality.

Core Features

1️⃣ Conversational Assistant

This mode focuses on enabling fluid and intuitive communication with the virtual assistant, freeing users from the constraints of text-based input. Key features include:

Voice-to-Text Conversion:
Leveraged Meta’s Voice SDK and Claude’s API to provide seamless voice input, reducing the friction of manual typing.
Camera Access Hack:
Developed an innovative workaround to bypass Meta’s camera restrictions, enabling real-time transmission of the user’s view to the Claude API. This enhances contextual understanding and improves communication with the assistant.
Memory Storage System:
Integrated a cloud-based system to address the lack of memory in LLM APIs. This system stores three types of documents:
- User inputs
- API outputs
- Conversation logs
  This enables the assistant to maintain continuity across interactions, mimicking a human-like understanding of context.

Goal: To free up users' hands and deliver answers quickly and effectively, enhancing productivity and convenience.

2️⃣ Spatial Computing Assistant

This mode introduces the capability for the assistant to understand and interact with virtual environments spatially. Key features include:

Scene Understanding:
Used Meta’s Depth API and metadata extraction to convert Unity scene prefabs into JSON files. This, combined with real-time image capture, gives the assistant a spatial awareness of the user's environment.
Texture Generation and Application:
By describing objects and materials through voice, users can instruct the assistant to generate textures using the Stable Diffusion API. These textures are then automatically applied to the specified GameObjects in Unity.

Goal: To empower users with the ability to manipulate virtual environments intuitively, without the need for complex manual operations.

3️⃣ Object Transformation Assistant

This mode bridges the physical and digital worlds, enabling quick and efficient transformation of real-world objects into virtual assets. Key features include:

Background Removal:
Leveraged image capture to implement an automatic background removal mechanism on the cloud, ensuring clean inputs for further processing.
3D Reconstruction:
Processed images are sent to Meshy’s API, which rapidly generates 3D reconstructions of the objects.

Goal: To streamline the process of converting physical objects into virtual assets, facilitating immersive environment creation.

Project Goals

This prototype is a step towards enhancing the usability of Quest and LLM integrations, pushing the boundaries of what’s possible in spatial computing and virtual assistance. It aims to create a seamless interface between human intention and machine execution, reducing friction and enabling more natural interactions.

ME.gic.LO.gic “About” Human Computer/AI Interaction “Bridge as an Astryn”

XR Development “Spatial Intelligence”

Architecture / Installation “Rationale”

Photography “Innerpeace”

Read more →