My goal is to refine this into a modular instrument for the modern performer (musician, DJ, dancer) that allows for total control over audio and image simultaneously. This project moves beyond standard audio-reactive visualizations. It empowers the artist to be the conductor of a synesthetic experience, proving that in the age of AI, the human element - our movement and musicality - remains the central driver of creativity.
The core idea is to create a bi-directional feedback loop where the performer's body and musical choices drive the generative reality in real-time.
Current Workflow & Mechanics
I am building a system of interdependent parameters between TouchDesigner and Ableton Live. Here is the current setup I am iterating on:
Curation as the human core: The human element isn't just movement; it is the intentional curation of the sonic palette and the specific prompt sequences. I am designing the sample banks (Ableton) and the prompt architecture (StreamDiffusion) to ensure the AI operates within a specific, cohesive aesthetic rather than generating random chaos.
Dual-purpose MIDI control: I am setting up the MIDI controller to perform two tasks simultaneously: triggering specific audio clips in Ableton and changing the prompt context within StreamDiffusion. This ensures the visual vibe shifts instantly with the musical arrangement.
MediaPipe integration: I am implementing MediaPipe as a virtual MIDI controller. The goal is for specific hand gestures to modulate audio effects (like filters or reverb) while simultaneously manipulating the visual input parameters fed into the diffusion engine.
Camera pre-processing: Optimizing the camera feed to ensure StreamDiffusion interprets the figure in the frame accurately, maintaining a recognizable link between the performer and the generation.
Current Challenges & Next Steps
My main focus right now is on optimization and mapping.
System Load: I am configuring the dependency grid to ensure the data flow is efficient and doesn't overload the CPU/GPU, allowing for real-time fluidity.
Gestural Mapping: I am experimenting with selecting the most natural gestures and mapping them to the right parameters. I want the connection between a hand movement, the resulting sound effect, and the visual distortion to feel intuitive and seamless.
Credits / Resources
Torin Blankensmith - essential tutorials on MediaPipe integration:
added feedback elements to get the effect of painting with hand
added pinch gesture for feedback reset
changed the gesture horizontal mapping for a more intuitive approach - now the hand gives the effect of washing the texture over
added a second camera processing unit based on POPs and a switcher to choose between two of them (for a different character of an output video)
tweaked noise parameters in the first camera processing unit
cleaned up the network - every processing unit is now packed into containers with previews of what is going on inside for a quicker understanding
Plans for the next week:
tighter integration with Ableton Live (choosing samples, choosing the gestures for an intuitive approach to sound design) and tweaking the audio-reactivity
smooth prompts transition
Final update:
Demo1Demo2
additional MIDI controlling gestures added (left and right side of the screen logic)
integration with Ableton
audioreactivity tweaked
general adjustments of the network and annotations inside of the project