Workflow for tenanted RAG, temporal and long standing vector storage/querying

I’m looking to build a RAG workflow in an internal chat application for file uploads specifically.

Essentially the workflow could be any of:

  • User/Tenant uploads a file into a thread - vectorise and then query the vectors based on the users query
  • User/Tenant uploads a file to an “assistant” which can be queried by other users

My question is one of architecture suggestions, I was looking at OpenAI’s Vector databases and they’re essentially what I’m looking for but I would like to isolate the RAG component away from what they offer hence looking at Weaviate.

Thus the questions:

  • Would a collection per thread make sense? Or is there some partitioning possible within a collection - would that run into performance issues?
  • Threads are ephemeral, my intention is to expire these threads’ vector stores on a regular basis - I was looking at using collection properties to record and then use these to prune collections as-needed.

Anyone running a similar workflow and have any suggestions?

hi @Will_Julian-Vicary !!

Welcome to our community :hugs: !!

Considering what you described as use case, I believe a collection per thread makes more sense.

Consider that, if you go with the route of having one collection with multiple thread and filtering them out using a property (for search/crud), and your thread is ephemeral, you’ll need to delete all the objects from a thread eventually.

And this is a costly operation. The objects of a single thread will be indexed in the same vector space of all other thread objects. So whenever you delete/update objects filtering it out by thread_id property, for example, Weaviate will have to deal with all that deletions, by creating and managing tombstones, etc.

This multiple threads per collection approach will give you the same problems we had before multi tenancy.

It is doable. However it will require a close eye on your scale and infra, on how you can run this operation and monitor metrics around tombstone and maybe fine tuning it’s cycle settings (check for the TOMBSTONE_ env vars).

While having one collection per thread, and you remove that collection, all the vector space will be entirely gone. No compute needed.

There is one caveat. :see_no_evil:

You will only be able to query the objects of that single thread in that single collection. If you need to query two threads at the same time, this approach will not suffice.

Another small caveat… so 1.3 caveats :joy:

When you end up with a lot of collections on a node, and it needs restarting, the minimum service time will increase, as Weaviate will need to load all data into memory at startup.

Fortunately, we have introduced Lazy Shard Loading in 1.23, that will prioritize any collection that has a pending request for it.

Also, having a HA with multiple nodes can mitigate it.

Let me know if this helps or if you have any further doubts :slight_smile:

Thanks!

Amazing - thanks for the quick feedback, really appreciated and this really helps. Sounds like a collection per “group” makes most sense, I was concerned there might be size limits on collections but it seems this isn’t the case with Weaviate!

Got a trial running on cloud and will get stuck in!

1 Like

Great! Let us know if you need any help :slight_smile:

And thanks for using Weaviate :heart_eyes: