Exact Query Filter

Description

I am using weaviate to search based on a column in schema. I defined a string as ‘doc_id’ in schema. Doing exact search using filters=wq.Filter.by_property(“doc_id”).equal(doc_id)
is considering approximate search instead of exact(equals) search. So searching of doc_id =“MY-DOC” is returning “44-MY-DOC” as well and it doesnt work with containsALL and containsAny. How to resolve this ?

hi @Rajat_m7 !!

That’s expected because how you have configured the tokenization. If you didn’t set a specific tokenization for that property it defaults to word.

Check here for more on tokenization:

This means that:

44-MY-DOC will become three tokens: 44 MY and DOC
MY-DOC will become two tokens: MY and DOC

In order to search like you mentioned, you need to set the doc_id to use the field tokenization, so you will endup with a 44-MY-DOC token instead of three separate ones.

Let me know if this helps!

THanks!

1 Like

Hi @DudaNogueira ,
I have multiple doc ids to search for in hybrid mode which i am doing using

query_property = "text"
response: GenerativeSearchReturnType = await chunks_collection.query.hybrid(
                            query=query,
                            vector=vector,
                            alpha=0.5,
                            limit=10,
                            query_properties=[query_property],
                            return_metadata=wq.MetadataQuery(distance=True, score=True, explain_score=True),
                            filters=wq.Filter.by_property("doc_id").equal(doc_id)
                        )

there are parallel calls for eac
h doc_id. With so many paralle calls(400 doc_ids in parallel) weaviate node shows high cpu usage and stops responding.
Is there a way to optimise parallel calls on weaviate node or use batch queries to solve no. of parallel calls?