Hi, I am new to Verba and LLMs in general, however I have spent a lot of time over the last couple weeks getting up to speed. Verba seemed like a great way to POC some of the ideas I’m trying to sell to my company’s product management team.
I noticed that there is only the one basic “chunker” available. Is there a location where other chunkers can be found? Many of the videos seem to show other chunkers, but the project only has the Token Chunker.
I’m having trouble getting the results I am looking for, and wanted to play with other chunking strategies. I could certainly write some, but for POC work I’d rather not have to write a chunker just to try it, so I thought I’d look to see if anyone knows of any repos that have chunkers for Verba or that could easily be adapted.
Thanks in advance for any suggestions. So far, I am having a good time with Verba!
hi @mmike87 !
There is a folder where the chunking components live:
Indeed there is only one there
If you want to play more with chunks, I first suggest you this doc:
Then I also suggest you this recipe, using langchain:
GitHub - weaviate/recipes: This repository shares end-to-end notebooks on how to use various features and integrations with Weaviate at the core! (check integrations / llm-frameworks / langchain)
There you can see how to use langchain’s chunking libs. And now the third suggestion, are suggestion, langchain’s chunk libs:
Of course, that will drive you apart from Verba a little bit, but will get you a good set of tools to tune and experiment more.
For now, Verba, understandably, is only supporting a limited number of splitters/chunkers, models, etc.
Please, feel free to voice your feature request at our github:
Let me know if this helps
Thanks!
Thanks! That is indeed helpful. I haven’t played with LangChain yet, but no better time than the present. Thanks again.
1 Like