User-specific similarity search

busbyk · June 16, 2023, 7:39pm

I’m building a QA chat experience that’s tailored to a specific user’s data using a RetrievalQAChain from Langchain. Let’s say the user has many “events” that I want the chat to be able to answer questions about. I create embeddings from these and store them in Weaviate (using OpenAI to create embeddings). Well that works but I only want the chat to answer questions about the logged in user’s “events”.

I see two ways to do that:

Create a Weaviate Class for every user and store their embeddings in their Class. Always perform the similarity search on the user’s Class.
Use a single Weaviate Class for “events” but include a userId Property on each data object and filter on that userId when performing a similarity search.

Both approaches seem reasonable to me right now. I wonder if the presence of other users’ data objects in the single “events” Class would mess with the similarity search results regardless of the filter on the userId Property. But aside from that, the differences seem to be mostly in how they would be implemented (both seem easy enough tbh).

Does anyone have any comments/suggestions/considerations regarding either approach?

Thanks!

etiennedi · June 16, 2023, 9:16pm

Please see Multi-Tenancy with millions of tenants – an upcoming feature in v1.20 that should do exactly what you want.

Using classes is a known workaround, but it doesn’t scale beyond 2-5,000 tenants. Using filters can come with a big performance penalty depending on how the data is distributed.

busbyk · June 16, 2023, 9:39pm

Wow, that’s a well-timed blog post (and feature)! Thats perfect. Thanks so much for the quick reply and awesome explanation of the problem via that post.

I’m hacking something together at the moment so using classes seems like my interim approach which should work well for an MVP.

Thanks again!

Topic		Replies	Views
Need help combining weaviate with langchain Support	8	3054	April 5, 2024
Can we add multiple tenants in a vector similarity search Support	2	572	January 22, 2024
Limiting search results to a specific document per user when querying vectorized text documents Support integration	1	613	August 9, 2023
Correct way to store embeddings Support	2	845	October 3, 2023
How to look for answers by evaluating multiple properties of a class Support	4	736	June 12, 2023

User-specific similarity search

Related topics