Late Chunking

Description

Weaviate has introduced this “Late Chunking” methodology: Late Chunking: Balancing Precision and Cost in Long Context Retrieval | Weaviate

The email I received refers to code here: recipes/weaviate-features/services-research/late_chunking_berlin.ipynb at main · weaviate/recipes · GitHub

But my question is: Is there a more language agnostic approach to this? Or, you can only do it using Python code.

Server Setup Information

I use Weaviate Cloud.

Any additional Information

I currently use PHP curl commands to upsert my embeddings.

hi @SomebodySysop !!

That’s a great question.

I will ask internally. Thanks!

Hi there! Author of the notebook recipe here. Thank you for asking, this is indeed a great question.

The methodology behind late chunking is simply a switch in the order between document embedding and chunking, so this is entirely language agnostic. Whatever method you are currently using to embed individual chunks you can transfer to the document itself. I.e. chunk the embeddings instead of the document text itself, based on their token positions instead of markers in the text.

One important thing to note however, is that the embedding model you use must be capable of handling longer context documents. The method introduced by Jina.ai also recommended their embedding models, capable of processing 8192 token length documents: Embedding API. You can acquire these as an API call but this is outside of the Weaviate ecosystem. Jina AI’s v3 embeddings can also handle late chunking natively, but I am afraid I cannot vouch for it as personally I have not used it nor am I a developer for it, but this might be a much simpler way of doing things.

So overall if you follow the general python syntax outlined in the notebook but converted to other languages there should be no issues.

We are working for future implementation for late chunking as well as easier ways of handling embeddings within the Weaviate ecosystem as well, so keep an eye out for updates! I hope that helps and let me know if there’s any additional information or help I can give :slight_smile:

2 Likes

Thank you very much for the reply. This is what I do not understand:

Right now, let’s say I have a PDF that has 3 pages.

a. extract the text from the PDF
b. chunk the text, either semantically or by “sliding window”
c. individually upsert the chunks to Weaviate as embeddings.

By upsert, I meant that I send all the class properties for that object to:
$this->endpoint . '/v1/objects?consistency_level=ALL

I am using the latest OpenAI text-embedding-3-large model which, I assume, has a 8192 token length.

Now, in my current class properties, I identify the order of these chunks so that I am able to do Small to Big retrieval (retrieve adjacent chunks).

I am thinking it would be nice to do this instead with one Weaviate retrieval call rather than an extra steps I currently have to perform.

So my follow up question is, what exactly do I do different from what I am doing now to achieve Late Chunking?

I’m not a Python coder. I’m a PHP guy. But if I can do this through the API, all I need to know are the steps.

So can I ask, are you embedding your chunks separately, and then uploading them to your Weaviate collection as vector embeddings? I.e., using the Bring your own vectors? Or are you using OpenAI’s embedding service as provided by weaviate?

To use late chunking at the moment, you’ll need to

  1. Embed your entire document using your own embedding method
  2. Take note of where your chunks start and end, in terms of the token positions
  3. Average across the token embeddings from step 1 to obtain the chunk embeddings
  4. Use these chunk embeddings in the Bring your own vectors with Weaviate

Currently, step 2 is the trickiest part, especially when not using Python, because you’ll need to access the token start/end points which are specific to the tokenizer that your embedding method uses.

In which case I’d highly recommend using the new Jina embedding API, which has late chunking enabled. I have just gone and tested this out, and it works very well and easily.

Here is the link again: Embedding API

You can specify the language as PHP as well as many others.

To be clear, this is not using OpenAI’s text embeddings any more, but is far easier to use late chunking for. Once you have these embeddings, you need to use the Bring your own vectors service in Weaviate to add it to your collection.

Hope that helps!

2 Likes

Thank you for taking the time to explain. I have a much better understanding of what’s going on.

This is a direct solution to this query: Retrieving “Adjacent” Chunks for Better Context

That query is also what led me to develop my own “Comprehension Level” retrieval methodology (based on the Small to Big retrieval strategy), which also solves the issue of retrieving adjacent chunks within a specific radius of the core chunk.

I do see that the Jina embedding model does allow for REST API interaction which would allow me to develop my solution in PHP (Yay!). But, at this point, I am not anxious to change my Weaviate class embeddings model – I am getting very good results with the OpenAI model.

And, I am getting essentially the same functionality as Late Chunking without altering my current embedding techniques. So for now, I think I’ll put it on hold.

But I do appreciate you taking the time to explain how this works. I think it’s a great solution and hopefully I will be able to incorporate it down the line.

2 Likes

No problem at all, I’m glad you understand better!

In the future, late chunking will hopefully be included in our embedding service (this is not my department, but I’ve heard rumours), so eventually this will be extremely straight forward to implement.

I’ll just point out finally that late chunking only changes the values of the embedded vectors - it doesn’t retrieve adjacent chunks at query time, but the chunks themselves are ‘aware’ in some capacity of the chunks that originally neighboured them, even though they are separate now. So this is slightly different to retrieving adjacent chunks to the returned chunk from querying, which I imagine you can do by giving all chunks a unique index and then retrieving those indices after you have retrieved the relevant chunk. But I’ll stop there in case I’ve just confused things any further.

If you have any more questions I’m more than happy to answer them, but for now all sounds good and best of luck with what you’re working on!

2 Likes

Not confusing at all. Precisely what I am now doing.

2 Likes

Fantastic news, do you envision Late Chunking can be implemented into Verba as well assuming I use jina-embeddings-v2?

Thanks for the notebook, really curious about applying Late Interaction or Late Chunking into a system I’m working on, it seems like the unlock I need.

2 Likes

Really glad you appreciated the notebook and are excited about late chunking - there’s a lot going on in retrieval right now that’s really interesting.

There aren’t currently any plans to implement this into Verba, but this is something I’ve wanted to do since hearing about late chunking!

No promises to be made but it is on my to-do list at some point :grin:

1 Like