How to use reference for filter effciently?

shadowlin · March 7, 2024, 10:47am

Description

what does these warnning mean?

{"level":"warning","msg":"Number of found nested reference results exceeds configured QUERY_MAXIMUM_RESULTS. This may result in search performance degradation or even out of memory errors.","nested_reference_results":11922,"query_maximum_results":10000,"time":"2024-03-07T10:45:46Z"}

Server Setup Information

Weaviate Server Version: 1.24.1
Deployment Method: docker
Multi Node? Number of Running Nodes: 1
Client Language and Version: python 4.5.1

Any additional Information

DudaNogueira · March 7, 2024, 12:46pm

Hi @shadowlin !

I believe that you have a cross reference that was requested on a query, and those references (of all the queried objects), are bringing more objects than what is set for QUERY_MAXIMUM_RESULTS.

here is the code where this warning comes from:

github.com

weaviate/weaviate/blob/5f86b6edcb5bfd8f20997d832ed3fa93726cb91f/adapters/repos/db/inverted/searcher_ref_filter.go#L75


      
          
          	ids, err := r.fetchIDs(ctx)
          	if err != nil {
          		return nil, errors.Wrap(err, "nested request to fetch matching IDs")
          	}
          
          	if len(ids) > r.classSearcher.GetQueryMaximumResults() {
          		r.logger.
          			WithField("nested_reference_results", len(ids)).
          			WithField("query_maximum_results", r.classSearcher.GetQueryMaximumResults()).
          			Warnf("Number of found nested reference results exceeds configured QUERY_MAXIMUM_RESULTS. " +
          				"This may result in search performance degradation or even out of memory errors.")
          	}
          
          	return r.resultsToPropValuePairs(ids)
          }
          
          func (r *refFilterExtractor) paramsForNestedRequest() (dto.GetParams, error) {
          	return dto.GetParams{
          		Filters:   r.innerFilter(),
          		ClassName: r.filter.On.Child.Class.String(),

Let me know if that helps.

shadowlin · March 7, 2024, 6:53pm

I am using a schema like this:

File

auth_code

TextChunk

text
belong_to_file (reference to file)

is these a good idea that I have to use Filter.by_ref("belong_to_file ").by_property(“auth_code”) to do filter?
It could be a lot of qualified text chunks(could be 100k or millions ) after the filter.

Dirk · March 8, 2024, 12:57pm

Hi @shadowlin!

is these a good idea that I have to use Filter.by_ref("belong_to_file ").by_property(“auth_code”) to do filter? It could be a lot of qualified text chunks(could be 100k or millions ) after the filter.

That could be very unperformant and memory intensive.

What are you trying to achieve? Maybe there is a more efficient way to do that

shadowlin · March 11, 2024, 7:54am

I want to filter text chunk by auth_code and the auth_code is on file level.
I think another way is to let every text chunk has a auth_code field but when update auth_code I would have to update each text chunk for the file.

Dirk · March 14, 2024, 4:29am

Is auth_codeused to identify different customers on your side that shouldn’t be mixed? If yes, multi-tenancy might be a good alternative to filtering

shadowlin · March 14, 2024, 1:24pm

I do want to use multi-tenancy but the auth design need to mix them up some use have the permission to access almost all the file so can’t do that:<

Topic		Replies	Views
Strange hehavior when combine filters with and operator to filter by reference General	4	293	March 14, 2024
Need help about how to write a filter with python client v4 Support	8	384	January 30, 2024
How to design a schema with reference Support	3	228	March 13, 2024
How to get unique results based on references General	6	460	March 9, 2024
Impact of massive cross reference count on performance? Support	5	196	May 7, 2024

How to use reference for filter effciently?

Description

Server Setup Information

Any additional Information

Related topics