BYOV Hybrid search metadata

Description

I am using local generated embeddings (all-MiniLM-L6-v2) as vectors. The dataset is a subset of amazon products. Now i am doing a hybrid search which goes fine and the results do not seem wrong but I do not get any metadata all the fields are set to none? Is there anybody that knows why or how i can fix this?

Index code:

client.collections.create(
    name='amazon_products',
    properties=[
        wc.Property(name='title', data_type=wc.DataType.TEXT),
        wc.Property(name='imageUrl', data_type=wc.DataType.TEXT),
        wc.Property(name='productUrl', data_type=wc.DataType.TEXT),
        wc.Property(name='stars', data_type=wc.DataType.NUMBER),
        wc.Property(name='reviews', data_type=wc.DataType.INT),
        wc.Property(name='price', data_type=wc.DataType.NUMBER),
        wc.Property(name='listPrice', data_type=wc.DataType.NUMBER),
        wc.Property(name='categoryName', data_type=wc.DataType.TEXT),
        wc.Property(name='isBestSeller', data_type=wc.DataType.BOOL),
        wc.Property(name='boughtLastMonth', data_type=wc.DataType.INT)
    ],
    vectorizer_config=wc.Configure.Vectorizer.none(),
    vector_index_config=wc.Configure.VectorIndex.hnsw(
        distance_metric=wc.VectorDistances.DOT,
        ef=64,
        ef_construction=128,
        max_connections=32
    )
)

Query:

response = amaz.query.hybrid(
    query='What are the latest gadgets available for home automation?',
    vector=encoder.encode('What are the latest gadgets available for home automation?').tolist(),
    alpha=0.5,
    include_vector=True,
    fusion_type=wvc.query.HybridFusion.RELATIVE_SCORE,
    limit=5
)

response:

Zigbee Contact Sensor, Door and Window Monitor, Home Automation, Works with Home Assistant, SmartThings, Aeotec, Hubitat or Echo Devices with Build-in Zigbee Hub,hub Required
Object(uuid=_WeaviateUUIDInt(‘597bdbdc-5c83-4cee-a53b-bc483e0b31b7’), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None)

Mattel Say What
Object(uuid=_WeaviateUUIDInt(‘5e739bd2-ace3-4615-928e-40006de347fd’), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None)

Universal Smart Ceiling Fan Remote Control with Dimmer,Smart Home Devices That Compatible with Alexa and Google Assistant,3 Speed WiFi Ceiling Fan Wall Control,Timing Fan and Light Switch(White)
Object(uuid=_WeaviateUUIDInt(‘95581903-8e0b-48fd-af0b-f30559b58556’), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None)

Home Security Cameras Outdoor, 5MP Super HD Dual Band 5Ghz WiFi Camera with Pan Tilt Zoom, Smart Home Baby Camera for Pet, 2-Way Audio, Sound/Motion Alerts, Work with Alexa & Google
metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None)

Hi @Joey_Visbeen !

You need to explicitly request the metadata to be returned by passing them at return_metadata, for example:

response = articles.query.near_text(
    query="fashion",
    limit=5,
    return_metadata=wvc.query.MetadataQuery(distance=True)
)

Check here for more info:

Let me know if this helps :slight_smile:

@DudaNogueira Yess thank you! Just curious, what was the reason behind making this an explicit parameter? Why not always return these values?

1 Like

I believe the reasoning was that the client already returns all the properties by default, so adding all the metadata adds unnecessary (depending on the case) data to the result.

In the v3 python client, as it uses graphql under the hood, you always needed to specify the properties and/or metadata you want returned.

In order to avoid unnecessary data being sent between client and server, you should always specify the exact data you want returned.

So for example, for some use cases, the distance is not important, as it’s not used. But for some use cases, it’s necessary.

:slight_smile:

1 Like