Filtering
I would like to know where i can find some documentation on how applies filter on this retrieve function :
response = dataset.query.hybrid(
query=query,
vector=query_vector,
return_metadata=wvc.query.MetadataQuery(certainty=True),
fusion_type=HybridFusion.RELATIVE_SCORE,
limit=k,
)
The idea would be to filter trought the metadata with “and” / “or” logic gates → use both of them in a same time
Here is how my collection is created :
"""Function to add the data to the collection"""
chunk_embedding = list()
vector_list = self.make_embedded_chunks(chunks)
for i, chunk in enumerate(chunks):
chunk_embedding.append(
wvc.data.DataObject(
properties={
"content": chunk.page_content,
"source": chunk.metadata["source"],
"file_name": chunk.metadata["file_name"],
"extension": chunk.metadata["extension"],
"type": chunk.metadata["type"],
},
vector=vector_list[i],
)
)
I already look at those two link but it didn’t help me :
Hybrid search | Weaviate - Vector Database → no indication about how to do multiple “and”’ / “or” in the filter
Conditional filters | Weaviate - Vector Database → no indication about “or” / “and” gate and did not know if it works with the hybrid search
If there is also a way to do a filtering on the get collection function it will be perfect
dataset = self.client.collections.get(self.collection_name.value)
Thanks in advance for your help
hi @engelsl !!
Let me know if this is what you are looking for for:
import weaviate
from weaviate import classes as wvc
client = weaviate.connect_to_local()
collection = client.collections.create(
"Engelsl",
vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_openai()
)
dataset = [
{
"content": "content1",
"source": "source1",
"type": "same",
},
{
"content": "content2",
"source": "source2",
"type": "same",
},
{
"content": "content3",
"source": "source3",
"type": "other",
},
]
for item in dataset:
collection.data.insert(item)
Doing multi condition hybrid search:
from weaviate import classes as wvc
response = collection.query.hybrid(
query="this is a test",
return_metadata=wvc.query.MetadataQuery(certainty=True),
fusion_type=wvc.query.HybridFusion.RELATIVE_SCORE,
filters=(
(wvc.query.Filter.by_property("content").equal("content1") | wvc.query.Filter.by_property("content").equal("content2")) |
wvc.query.Filter.by_property("type").equal("other")
),
limit=10,
)
for o in response.objects:
print(o.properties)
If you want to fetch objects and apply a filter, that is also possible:
from weaviate import classes as wvc
response = collection.query.fetch_objects(
return_metadata=wvc.query.MetadataQuery(certainty=True),
filters=(
(wvc.query.Filter.by_property("content").equal("content1") | wvc.query.Filter.by_property("content").equal("content2")) |
wvc.query.Filter.by_property("type").equal("other")
),
limit=10,
)
for o in response.objects:
print(o.properties)
1 Like
Thanks a lot @DudaNogueira
that was exactly what I was wondering i guess the ‘|’ is use to represent the “or” so if I want to use the “and” I need to use ‘&’
I have one more question : How the filtering is applied when put inside the hybrid retriever is that a basic filtering at the end of the ANN to accelerate the KNN or is it something more complex ?
Thanks again
Hi!
AFAIK, When passing filters to hybrid search, those filters will be used for the bm25 and the vector search. After getting the results for each type of search, they get fused.
I don’t believe they are filtered only after, as that would result in unnecessary calculations.
Let me know if this helps!
Thanks!
Hi @DudaNogueira, you are right.
The filters are applied before the vector or keyword search runs.
This way, we can skip calculating distances for objects that should be filtered out.
Hey @engelsl, that is correct &
is used for AND
, while |
is used for OR
.
You can see some examples in the docs here
2 Likes