Check out the latest model drops and powerful integrations.
On 1/30, ETHEREA had its latest event yet, a 1,500 person party celebrating thirty years of technology companies in New York City, hosted by Betaworks, Union Square Ventures, and AlleyCorp on the 42nd floor of a skyscraper in Manhattan.
In addition to being one of the largest crowd's we've seen so far, we projected on a thirty-two foot wall, making it the largest physical installation we've completed so far. It was a massive success, execution was technically perfect, and the crowd was super into it. We also pissed off the DJ which was another learning opportunity that I'll discuss below.
About ETHEREA: ETHEREA is an engine for collective imagination. Pick up the mic, speak what's in your mind's eye, and watch it become shockingly beautiful video in real-time. It's so simple, even toddlers understand it. Fluent in dozens of languages, it unites everyone in a shared field of joy, creativity, and exploration. Learn more at withETHEREA.com
More details below:
1. Technical specs: We were able achieve ~42fps at 832x480 using 8x RIFE w/ 2 inference steps on Longlive powered by a B300 on Verda. I found Verda because it was one of the only vendors that offered a B300. In order to use Longlive with B300, Claude had to adapt Scope to use FlashAttention2 instead of SageAttention. Because 832x480 is still pretty small for the canvas that we were working with, we placed the stream in a mirrored composition inside a canvas with custom ISF shaders that were reactive to both the music and the pixels in the stream. To minimize latency and make the stream as reactive as possible to participants' speech (and I'm not sure this was the ideal solution, but it worked) we used a copy of the WebRTC feed pushed through OBS to the rest of our projection mapping pipeline (Synesthesia + Resolume - more on this later)
2. Participant experience: This was a huge hit. I barely had to touch the interface the entire night. Our speech pipeline (Speech --> Deepgram --> Claude --> Scope) has more lag than I want (we are testing a version with Gemini live right now which is going pretty well and will report on this as we learn more) yet one of the host's primary feedback was how astonished he was at how responsive the system was in such a loud environment. There was a line pretty much all night! I spent a lot of my time wandering around the party pointing to the microphone and telling people they could control the visuals, which was pretty effective.
3. Style consistency: The first half of the night was a little all over the place stylistically, and definitely not consistent with the event's color palette. Halfway through, we used ETHEREA's style tags feature, which instructs Claude to output prompts in a specific style, to dial in on a red-tinted, futuristic aesthetic. At an event a week later sponsored by Hyatt, we started to experiment with LORA's to achieve this same effect, and are still experimenting on a process. The lack of consistency early on was pretty grating to the DJ which is both a good lesson about adhering to the vibe of an event and also about aligning with the DJ beforehand. The DJ in this case was definitely not on the same page as us. In the future we are seeking to collaborate with DJ's who are on the same page.
4. Interaction model: While we have a companion app that allows control via phone, we still haven't found a better alternative to a single microphone, and are interested in building the microphone into a more interesting installation in the future. One addition we've discussed a bit is a foot pedal that allows people to toggle between muted and unmuted states to avoid the amount of background noise that is inevitably picked up by the transcriber. This was VERY apparent at the Hyatt event, which was in pretty close quarters.
As we consider our goal to perform with 10,000 people, we are wondering how we can expand the model beyond a single chokepoint. One idea we had was to make a collage /mural where different people could paint different sections. This will obviously depend on how the technology evolves. You could imagine using a single world model as a canvas for many people to add to. This is our key design challenge - how can we make a space interactive in a way that scales to massive crowds?
5. Integrity/content moderation: As a new model, we are still learning about Longlive and its idiosyncracies. One pretty horrifying incident during our testing before the party was a prompt for a gorilla DJ'ing producing a black man (this was a problem in the early days of generative images and I was surprised to see it again). This model also defaults to making things sexual pretty quickly, even as Claude does a pretty good job of defending against obviously explicit text prompts. We have amended our text prompting to be pretty restrictive on how it portrays humans given an incident at a party a few weeks ago where the model inexplicably produced a naked woman interpolating with a piece of ravioli. I had never seen anything like that before in my life. Given these idiosyncracies, a pretty common question from people interested in ETHEREA for their usecases (including and especially for children) is how we can introduce more guardrails. This is driving us to prioritize a set of LORA's that also enable us to achieve stylistic consistency.
We'll have footage from the Hyatt party soon and I'll do a followup post with more about what we learned about there.
Thanks!