Additional Chunkers for Verba

Hi, I am new to Verba and LLMs in general, however I have spent a lot of time over the last couple weeks getting up to speed. Verba seemed like a great way to POC some of the ideas I’m trying to sell to my company’s product management team.

I noticed that there is only the one basic “chunker” available. Is there a location where other chunkers can be found? Many of the videos seem to show other chunkers, but the project only has the Token Chunker.

I’m having trouble getting the results I am looking for, and wanted to play with other chunking strategies. I could certainly write some, but for POC work I’d rather not have to write a chunker just to try it, so I thought I’d look to see if anyone knows of any repos that have chunkers for Verba or that could easily be adapted.

Thanks in advance for any suggestions. So far, I am having a good time with Verba!

hi @mmike87 !

There is a folder where the chunking components live:

Indeed there is only one there :frowning:

If you want to play more with chunks, I first suggest you this doc:

Then I also suggest you this recipe, using langchain:
GitHub - weaviate/recipes: This repository shares end-to-end notebooks on how to use various features and integrations with Weaviate at the core! (check integrations / llm-frameworks / langchain)

There you can see how to use langchain’s chunking libs. And now the third suggestion, are suggestion, langchain’s chunk libs:

Of course, that will drive you apart from Verba a little bit, but will get you a good set of tools to tune and experiment more.

For now, Verba, understandably, is only supporting a limited number of splitters/chunkers, models, etc.

Please, feel free to voice your feature request at our github:

Let me know if this helps :slight_smile:

Thanks!

Thanks! That is indeed helpful. I haven’t played with LangChain yet, but no better time than the present. Thanks again.

1 Like