Hello,
I recently read the excellent blog post on 8-bit Rotational Quantization, which showcases its effectiveness. In the article, there is a table demonstrating the key metrics across low, moderate, and high compression levels.
However, I noticed in the documentation that the 4-bit “moderate compression” does not currently seem to be available in Weaviate. Only the Low 8-bit and High 1-bit are implemented. I would love to be able to use it in my setup as it looks like a great sweet spot between space usage and recall.
Is the implementation of moderate compression on the roadmap, or are there current technical limitations preventing it ?
Thanks in advance for your time and insights!
1 Like
Hi @Maxence_Oden !!
Welcome back!
I will raise this with our research team, and circle back here.
Thanks for using Weaviate and pushing it 
Happy coding!
I am just learning about this type of Quantization - looks super cool - maybe this might be interesting too (by Google Research):
TurboQuant: Redefining AI efficiency with extreme compression
Just gonna plug why I’m into quantization - I’m working on a way to democratize huge OSS LLMs for us GPUPoor folk haha - 4Bit-Forge (Compression: Quantization and Sparsity + CUDA Hella)
1 Like
hi @fraulty !!
Welcome to our community 
Our team was just discussing about it internally 
1 Like
@DudaNogueira oh that’s super cool to know! Thank you
- hope to learn a lot more from the community!
1 Like