How to design a schema with reference

shadowlin · March 12, 2024, 4:13am

Description

I need to create a schema to be able to do vector search with auth restriction.
My current approch is
File

…file meta info
auth_code # to decide if the user has the permission to read the file

TextChunk

text
has_file # reference to file

And search is on TextChunk collection,the filter is like
Filter.by_ref(“has_file”).by_property(“auth_code”).contains_any([auth_code_list])

There could be 100k-1M files and the text chunk should be 20 to 50 times of the files.And some user could have permission on almost everyfile(but still need to be filter by auth_code).

Is this a good schema to use with this kinda situation?
Should I move the auth_code to TextChunk level? the reason I now put it in file is if to change the auth_code of a file I only need to update once on File.

Server Setup Information

Weaviate Server Version:
Deployment Method:1.24.1
Multi Node? Number of Running Nodes:
Client Language and Version: 4.5.1

Any additional Information

DudaNogueira · March 12, 2024, 5:14pm

Hi @shadowlin !

I think some tests would be better to determine the impact of storing this on File x on the chunk.

While storing the auth_code on the File model is easier to manage changes, storing in the Chunk will not require the cross reference, so may be faster

shadowlin · March 13, 2024, 1:28am

So what is the limitation of reference?

Filter.by_ref(“has_file”).by_property(“auth_code”).contains_any([auth_code_list])

if above filter will have like 10m or even more eligible text chunk. could it greatly impact the performance?

Is there any guide of how to use reference properly?

DudaNogueira · March 13, 2024, 9:16pm

I don’t know the exact details on why and how it would impact, but considering that you will have a lots of auth_codes, the more you have, the more it will need to match your query for to start selecting the files you want to hit.

Not aware of some best practices on cross reference. I believe this would be an interesting scenario to compare:

Cross reference of Auth codes
Store the cross references on a File property, and filter directly on it.

Unfortunately I doesn’t have an answer for that

Topic		Replies	Views
How to use reference for filter effciently? Support	6	251	March 14, 2024
How to get unique results based on references General	6	460	March 9, 2024
Impact of massive cross reference count on performance? Support	5	196	May 7, 2024
Choosing a schema for Chunked documents Support	2	667	November 2, 2023
How do I match reference with near_text or hybrid Support	3	185	September 24, 2024

How to design a schema with reference

Description

Server Setup Information

Any additional Information

Related topics