1. Noema: Rethinking Perception with AI-Generated Spatial Soundscapes
25 Spring, MIT. With Nomy Yu.
- Merleau-Ponty & Phenomenology"What is perceived is not the world itself, but the sense of being-in-the-world."
We present NOEMA, a wearable Large Language Object (LLO) that reimagines human-AI interaction through spatialized audio experiences, including ambient cues, storytelling, and music. Unlike traditional LLM interfaces that rely on textual or visual modalities, NOEMA embodies the language model as an "inner voice", a sensory extension that perceives, interprets, and communicates through sound.
Our main contributions include:
- A novel interaction framework where LLM perception is spatialized, embodied, and auditory.
- The design and fabrication of a functional prototype that integrates vision sensing and real-time audio generation.
Activated when the user closes their eyes, NOEMA captures environmental data via cameras and generates real-time, spatially oriented audio narratives. This approach challenges the conventional expectation of AI precision, embracing ambiguity, hallucination, and imagination to foster a co-constructed, multisensory reality. Our work explores the potential of embodied AI to augment human perception and cognition, offering insights into the design of future AI-integrated wearable technologies.
The study design follows a system-building approach with iterative testing. The methodology comprises five key stages:
1. Eye status detection through MLLM
2. Subject Recognition & Summary Generation
3. Narrative segmentation through MLLM
4. Dynamic Spatialization
5. Interactive audio playback & Music Generation
NOEMA does not seek to reproduce reality with accuracy, but rather to amplify, bend, and recontextualize it through sound, inviting users to sense the world not just as it is, but as it might be perceived.
Beyond technical implementation, NOEMA prompts broader questions about the future of human–AI symbiosis. What does it mean when AI not only responds to our input, but perceives and interprets the world with us and for us? By framing hallucination and sensory ambiguity as design opportunities rather than limitations, we argue for a more poetic, affective, and multisensory approach to embodied AI. This work is only a starting point: we envision future iterations of NOEMA enabling dynamic and mobile user experiences that further blur the boundary between body, machine, and mind.