User-specific similarity search

I’m building a QA chat experience that’s tailored to a specific user’s data using a RetrievalQAChain from Langchain. Let’s say the user has many “events” that I want the chat to be able to answer questions about. I create embeddings from these and store them in Weaviate (using OpenAI to create embeddings). Well that works but I only want the chat to answer questions about the logged in user’s “events”.

I see two ways to do that:

  1. Create a Weaviate Class for every user and store their embeddings in their Class. Always perform the similarity search on the user’s Class.
  2. Use a single Weaviate Class for “events” but include a userId Property on each data object and filter on that userId when performing a similarity search.

Both approaches seem reasonable to me right now. I wonder if the presence of other users’ data objects in the single “events” Class would mess with the similarity search results regardless of the filter on the userId Property. But aside from that, the differences seem to be mostly in how they would be implemented (both seem easy enough tbh).

Does anyone have any comments/suggestions/considerations regarding either approach?

Thanks!

Please see Multi-Tenancy with millions of tenants – an upcoming feature in v1.20 that should do exactly what you want.

Using classes is a known workaround, but it doesn’t scale beyond 2-5,000 tenants. Using filters can come with a big performance penalty depending on how the data is distributed.

Wow, that’s a well-timed blog post (and feature)! Thats perfect. Thanks so much for the quick reply and awesome explanation of the problem via that post.

I’m hacking something together at the moment so using classes seems like my interim approach which should work well for an MVP.

Thanks again!