Description
I am using weaviate to search based on a column in schema. I defined a string as ‘doc_id’ in schema. Doing exact search using filters=wq.Filter.by_property(“doc_id”).equal(doc_id)
is considering approximate search instead of exact(equals) search. So searching of doc_id =“MY-DOC” is returning “44-MY-DOC” as well and it doesnt work with containsALL and containsAny. How to resolve this ?
hi @Rajat_m7 !!
That’s expected because how you have configured the tokenization. If you didn’t set a specific tokenization for that property it defaults to word.
Check here for more on tokenization:
This means that:
44-MY-DOC
will become three tokens: 44
MY
and DOC
MY-DOC
will become two tokens: MY
and DOC
In order to search like you mentioned, you need to set the doc_id
to use the field tokenization, so you will endup with a 44-MY-DOC
token instead of three separate ones.
Let me know if this helps!
THanks!
1 Like
Hi @DudaNogueira ,
I have multiple doc ids to search for in hybrid mode which i am doing using
query_property = "text"
response: GenerativeSearchReturnType = await chunks_collection.query.hybrid(
query=query,
vector=vector,
alpha=0.5,
limit=10,
query_properties=[query_property],
return_metadata=wq.MetadataQuery(distance=True, score=True, explain_score=True),
filters=wq.Filter.by_property("doc_id").equal(doc_id)
)
there are parallel calls for eac
h doc_id. With so many paralle calls(400 doc_ids in parallel) weaviate node shows high cpu usage and stops responding.
Is there a way to optimise parallel calls on weaviate node or use batch queries to solve no. of parallel calls?