Realtime Upscaler: FlashVSR Scope Demo

Realtime Upscaler: FlashVSR Scope Demo

Realtime Upscaler: FlashVSR Scope Demo

Daydream Scope

Explore new worlds with Daydream Scope

Check out the latest model drops and powerful integrations.

Download Now

I’ve been digging into real-time upscaling solutions for AI video, specifically looking for a way to break past the "High Fidelity vs. High Speed" trade-off.

Standard upscalers like Real-ESRGAN (SISR) treat every frame individually, leading to that notorious "flickering" and instability and most of the open source VSR based solutions out there which produce great quality solution are not Auto-Regressive and are also heavy. 

Enter FlashVSR.

FlashVSR is a one-step streaming diffusion framework that achieves real-time, high-quality video super-resolution. It introduces One-Step Streaming Distillation, Locality-Constrained Sparse Attention (LCSA), and a Tiny Conditional Decoder to deliver upscaling with extreme efficiency.

I’ve written a full technical report comparing various upscalers (SISR vs. VSR), have a read here but I wanted to share a demo of FlashVSR running directly inside Daydream Scope

Demo for running FlashVSR in Scope

Why FlashVSR? Unlike standard image upscalers, FlashVSR uses Video Super Resolution (VSR). It utilizes temporal information across multiple frames to maintain consistency.

  • Stability: Drastically reduces flickering compared to GANs/SISR.
  • Speed: It uses Block-Sparse-Attention (O(N) complexity), making it significantly faster than typical generation models.
  • Quality: It doesn’t just magnify mistakes; it actually corrects them using temporal data.

Once you have Scope up and running, you can install this right now into your running Scope instance with a single command:
 uv run daydream-scope install git+https://github.com/varshith15/FlashVSR-Pro.git

In the demo video, you'll see the output running at around 15 FPS on a H100 SXM.

  • The Reason: In this specific demo setup, we are decoding WebRTC streams and moving data to the GPU, which adds significant overhead (cutting 31 FPS down to 20-22 FPS).
  • The Fix: When integrated as a Post-Processor in your generation pipeline (e.g., after LongLive or StreamDiffusion), the tensors are already on the GPU. In that environment, the overhead is mitigated, and we will see FlashVSR performance to be more than 30 FPS.

You can run FlashVSR not only on a H100 SXM but on any machine which has minimum of 15 GB VRAM, on RTX 5090 we get about 20 FPS as a post-processor and about 14-15 FPS end to end as a standalone pipeline. 

The Math: Why Post-Processing is the Unlock

While the demo shows FlashVSR running as a standalone plugin, the real efficiency unlock comes from chaining it directly as a post-processor after your generation pipeline (e.g., LongLive). By keeping the tensors on the GPU, we avoid the expensive encode/decode roundtrips and get a massive performance boost compared to native high-res generation.

Here is the math on why this approach wins:

  • Case 1 (Native High-Res): Generating native 1024x1024 video on an H100 currently caps out at about ~6 FPS. It’s computationally heavy and prone to OOM issues.
  • Case 2 (Generation + FlashVSR):Generate at 512x512 (approx. 40ms latency)Upscale to 1024x1024 with FlashVSR (approx. 32ms latency)Total Latency: ~72ms per frame, which translates to ~13.9 FPS.

By generating at lower resolution and using FlashVSR as a post-processor, we effectively get >2x the performance (13.9 FPS vs 6 FPS) for 1024px output, without sacrificing the stability or quality of the final video.

For more details, quality comparisons between different upscalers check out: detailed report

Updates:

- Pushed a fix recently, the E2E fps should now be about 20-22 on a H100 SXM (Updated the values in the report)

- Switched to Sparse Sage Attention instead of Block Sparse and we get about 22 FPS inference throughput on a RTX 5090 which is almost 1.5x than before.