ETHEREA Imagination Engine

00:00
00:00

ETHEREA Imagination Engine

Daydream Scope

Explore new worlds with Daydream Scope

Check out the latest model drops and powerful integrations.

Download Now

I have been working on **ETHEREA** (realtime voice-to-visuals) for a little over a year and a half now, starting from StreamDiffusionTD, migrating with Daydream's generous support to the Daydream API earlier this year, and now I've got **Longlive** working at **~22fps at 832x480 on a B200**, with pretty mindblowing quality.

I am preparing for a 1,000-person gala in New York City at the end of January, where this will be a featured installation.

ETHEREA works by taking unstructured voice prompts and using an LLM to convert them into rich, iterative visual prompts that immediately impact visuals. After demoing to thousands of people over the past 18 months, Longlive/Scope represents a new and exciting level of fidelity.

Here are a few of the things I've accomplished so far since beginning the realtime video program:

1. **Patched scope to support RTMP/MediaMTX and multiple streaming destinations:** I have been streaming to YouTube most days at [YouTube Link] while iterating, which takes two clicks from within the ETHEREA interface. I intend on adding Instagram and TikTok support soon. We have big livestreaming ambitions, so making this as easy as possible from within our interface is priority. This also powers a second screen "projection mode" which we use to broadcast the stream on projectors in full screen.

2. **Integrated Daydream API as a video source for Longlive:** This has been really great so far—my pipeline is ISF Shaders (isf.video) --> Daydream API --> Longlive. I now have four modes (Longlive + Isf Shaders [Video mode], Longlive [text generation mode], Longlive + Daydream, Daydream). One really cool thing about Longlive's cache is that you can use these different modes as a way to navigate through the latent space and change scene composition. We've made it really easy to toggle through different modes and expect that we can add new models in parallel as they come online to experiment in this way.

3. **Refactored ETHEREA UX:** I've cleaned up the interface and am starting to feel a lot better about how people interact with it, especially with the mobile companion interface, which party attendees will be able to access by scanning a QR code. I stripped out a few features, like easy parameter controls, which I still need to add back in.

4. **Integrated Realtime Voice with Seamless Tool Calling:** This is the latest major update. I have implemented a voice layer that allows users to converse naturally with the visualizer. The system utilizes seamless tool calling to intelligently decide how to react to input: it knows when to **reset** the context entirely (e.g., "Take me to a tour of Tokyo") versus when to simply **update** the existing scene (e.g., "Add a duck to the pond"). As seen in the attached demo, the transition from a busy Shibuya street to a calm garden, followed by the specific addition of a duck to the water, happens fluidly in real-time.

Attachments
v5