
Check out the latest model drops and powerful integrations.
Okay, this is going to be more like a journal entry than a finished report. I'll be tracking my progress as I work on optimizing the core of Scope. My goal for this project is to optimize it to run efficiently with minimal VRAM usage, without sacrificing quality.
My initial attempts involved implementing NF4 optimizations, but the results led to a significant loss in quality. see below
SSo, once I realized that Q4 was too small, resulting in significant quality loss with weight-only optimization, I started exploring alternatives like Sage attention and Triton, along with different caching methods. I also looked into SVDQuant for Q4 quantization, but it seems promising, it's beyond my current capabilities. Specifically, trying to adapt a 2D optimization project to a 3D context feels like it might be beyond my skillset.
My current focus is implementing a good caching system, like TCache, or other caching implementations, into Scope. It will have a default on/off toggle for VRAM optimization. I'm also looking into smart VRAM tricks and similar approaches.