Rotational Quantization moderate compression

Hello,

I recently read the excellent blog post on 8-bit Rotational Quantization, which showcases its effectiveness. In the article, there is a table demonstrating the key metrics across low, moderate, and high compression levels.

However, I noticed in the documentation that the 4-bit “moderate compression” does not currently seem to be available in Weaviate. Only the Low 8-bit and High 1-bit are implemented. I would love to be able to use it in my setup as it looks like a great sweet spot between space usage and recall.

Is the implementation of moderate compression on the roadmap, or are there current technical limitations preventing it ?

Thanks in advance for your time and insights!

1 Like

Hi @Maxence_Oden !!

Welcome back!

I will raise this with our research team, and circle back here.

Thanks for using Weaviate and pushing it :wink:

Happy coding!

I am just learning about this type of Quantization - looks super cool - maybe this might be interesting too (by Google Research):
TurboQuant: Redefining AI efficiency with extreme compression

Just gonna plug why I’m into quantization - I’m working on a way to democratize huge OSS LLMs for us GPUPoor folk haha - 4Bit-Forge (Compression: Quantization and Sparsity + CUDA Hella)

1 Like

hi @fraulty !!

Welcome to our community :hugs:

Our team was just discussing about it internally :wink:

1 Like

@DudaNogueira oh that’s super cool to know! Thank you :blush:- hope to learn a lot more from the community!

1 Like