
Check out the latest model drops and powerful integrations.
The convergence of real-time video generation and world models represents a shift in how we'll interact with generative AI systems. The most successful applications will craft intuitive and expressive authorship experiences around increasingly commoditized model capabilities.
The broad adoption of high-ceiling tools like TouchDesigner and ComfyUI illustrates this principle; they leverage a node-based interface and a robust plugin ecosystem to unlock incredible customizability and control.
This article explores the future of UX for real-time interactive video and world models, and concludes that use-case-specific controllability is the foundation for success at the application layer.
We're witnessing a fascinating collapse of modality boundaries in real-time AI video and world models. Technical challenges from causal generation to frame compression are converging rapidly, and we're seeing patterns emerge.
As this convergence accelerates, we will see a new wave of world models that are:
Modern AI models and workflows are incredibly feature-rich, but a workflow is only effective if a user can adequately control it to achieve a use-case-specific goal.
As models improve and modalities collapse, the best authorship experience at the app layer will win.
The best authorship experience is a function of controllability and quality; quality will be commoditized at the model layer, and controllability may eventually be commoditized at the infra layer (though it is likely to remain fragmented). While controllability may be standardized at the app layer, it will never be fully commoditized; there are simply too many market opportunities, and too many possibilities.
There used to be a clear distinction between creators who produce and audiences who consume, but real-time controllable AI enables a new model: every interaction becomes an opportunity for transformation. Instead of watching a video, playing a game, or viewing content, users fork it, remix it, and make it their own in real-time. This shift manifests across both single-player and multiplayer contexts.
This is an incredible change to how we think about authorship. But if everyone is now a creator — and a workflow has hundreds of implicit and explicit parameters that affect output — how do you expose the right controls for a user to achieve their goals?
Every use case demands a slightly different authorship experience, even when built on the same underlying workflow. Crafting a great authorship experience starts with understanding who is doing the creating, and why they're doing it.
Here are a few examples from domains where real-time AI and world models are being deployed today:
Even autonomous systems need human-designed control interfaces. Examples include:
While all aspects of quality will eventually become commoditized as models improve, controllability will remain a complex, multi-dimensional challenge at all layers of the stack.
For application developers, this presents opportunities to create powerful and differentiated user experiences.
Let's dig deeper into three aspects of controllability that are most relevant to UX: Control Surface, Action Latency, and Workflow Composability
Control surface refers to how users supply information to a workflow to control its behavior.
Frontier models ultimately ingest data; that data can be supplied by a user in many ways. The choices you make when designing your control surface defines your application's expressiveness.
There are many ways you can allow users to control the underlying workflows, including:
Within these modalities, there is a nearly unlimited design space.
Action latency refers to the time between a user action and a visible response in the output.
Action latency determines whether your application feels like a powerful tool or a tech demo. This requires intentional architecture throughout your entire pipeline: ingest, pre-processing, inference, and transport.
Gaming is a great example of the importance of action latency. These are a few benchmarks for the relationship between perception and latency:
Different user groups, even within the same use case, have different needs. Professional content creators might want node-based editors with explicit control over every parameter, whereas casual users might want intelligent defaults with optional refinement.
Moreover, the ancillary requirements of each use cases—such as content moderation, foreground/background segmentation, easy recording—often determine the category winner.
Because small changes to the sequencing and configuration of your workflow can significantly impact your ability to meet the needs of a certain user group, it's crucial to think through how precisely a workflow will be configured.
The evolution of controllability won't stop at traditional input devices and UX patterns.
Forward-thinking application developers are experimenting with generative UIs and control patterns such as:
At the hardware and firmware layers, we're starting to see developments that will transform human-computer interaction:
Quality improvements in base models and workflows will continue, but over time they'll become table stakes. The applications that win will be those that build the most expressive, responsive, and flexible control systems around these workflows - and tailor them to serve a specific use case.
AI has changed many fundamentals of product development, but creating a great user experience remains the same: deeply understand a users' intent and craft a set of controls that let them achieve it.
The applications that recognize this early and architect their stack accordingly will define the interaction paradigms that become industry standards.