
Check out the latest model drops and powerful integrations.
Inspired by the Scope Overworld plugin and in an effort to understand the plugin architecture better ahead of its official release, I decided to try and write my own.
I picked the "Open Oasis" model from Decart, since it seemed to promise some interesting effects and the potential to be reasonably easy to get running in realtime at a good FPS, since it's only 500M parameters.

Scope's official plugin documentation hasn't been released yet, but armed with my trusty friend GPT-5.2 Codex and a couple of examples of existing plugins, I managed to generate something that at least loaded, ran and produced some sort of visual output...

Blocky vibe is right, but not much else
My initial prompt was unsophisticated (I also passed in some context about what Scope is and its repo):
Using ~/go/src/github.com/daydreamlive/scope-overworld and ~/go/src/github.com/daydreamlive/scope_yolo_mask as references, create a Scope plugin in the current directory that creates a Minecraft type effect using github.com/etched-ai/open-oasis (source for this can be found in ~/go/src/github.com/etched-ai/open-oasis)
but after a few iterations on the encoding and decoding steps, I managed to get something that looked reasonable!

Merncrerft
The only problem now was that and WASD or Mouse input would trigger a slow motion gitch into blurriness. A lot of back and forth with tweaking ddim_steps, number of frames generated per call, context frames etc. only succeeded in making the blur happen more quickly and smoothly.
Deciding to take this back to first principles, I fired up the static version of the model in a Runpod instance and tried to get it to generate a few seconds of video and got a blurry mess. In a flash of inspiration, I tried seeding it with another Minecraft screenshot and got this result:
It's going to take some more digging to understand exactly what's going on here, but my guess is that condensing the model down for this open source version has made it much more brittle to the input image being exactly right and prone to compounding any issues as it runs (the example implementation just runs a loop that passes in a context window of past frames to generate the next one).