I wanted to know if Weaviate has any plans to support Colbert. I saw a post about this a little while ago Weaviate & ColBERTv2? - #3 by Draco, unfortunately, the multiple-named vector approach mentioned there isn’t a great fit, as the number of Colbert vectors is dependent on the number of tokens given(I.E Not static).
hi @JK_Rider !!
Have you seen this recipe?
I am not sure the colBERTv2 support itself. I saw our team discussing something on this, I will ping them.
Thanks!
Hi @DudaNogueira ,
Took a look at the recipe, the issue I see is that recipe assumes the number of vectors generated would be static as your dealing with a image of a fixed size, the number of vectors per document with colbertV2 is based on the number of tokens in a text string which is usually not static.
Hey both, thanks so much for sharing this notebook @DudaNogueira!
Hey @JK_Rider, could you please point me to a more specific passage where this is mentioned? ColBERT / v2 / PLAID variants will all zero-pad queries and documents to have a fixed input length as far as I understand.
→ I think the key innovations in subsequent ColBERT works is compressing the vectors along the length dimension with forms of low-rank decompositions or maybe PCA – I think the discrete PQ-style methods won out in PLAID.
It does make sense to think the variable-length decoding stuff could make its way into embedding models, but I haven’t seen too many examples of this outside of maybe Cohere Compass.
Quote: The ColBERT v2.0 library transforms a text chunk into a matrix of token-level embeddings. The output is a 128-dimensional vector for each token in a chunk. This results in a two-dimensional matrix, which doesn’t align with the current LangChain interface that outputs a list of floats.
Link:Introduction to ColBERT | RAGStack | DataStax Docs.
The passage you gave refers to the dimensions of the individual vectors, but the issue is the number of vectors. Colbert creates vectors based on the number of tokens in a document. I.E highly variable
If it helps I can also provide code samples from a Jupyter notebook if that would be of assistance.
Hey @JK_Rider, yes the notebook would be super helpful.
From this reference – #2 supports my initial assumption that queries and documents are zero-padded to fixed length:
“2. BERT manages this additional depth by pre-processing documents and queries into uniform lengths with the Wordpiece tokenizer, ideal for batch processing on GPUs.”
So for example if a document has 30 tokens – you will then zero-pad it to 512 tokens. You then apply an attention mask so the gradient only goes to those original 30 tokens, but you still need it to have the 512 size as BERT models expect a fixed-length input. This is a key distinction between encoder-only versus decoder-only or hybrid encoder-decoder / seq2seq models.
Additional note: we are on this and you can expect something soon.
Please follow: HNSW Multi-value vectors (colbert) · Issue #4278 · weaviate/weaviate · GitHub
@CShorten
Ex 1: Using BGE-M3 with Flag Embedding
Ex 2: Using JinaColbert with official colbert repo
If you need any more examples or some more context, let me know.