Multi Vector Search misses querying through Docs where some of the target vectors are missing

Hi team,

I’m experimenting with named self-provided vectors and multi-vector search in Weaviate, but I’ve run into a problem.

I have a case where some of my documents don’t contain all of the named vectors. When I query with near_vector and TargetVectors.manual_weights(...), those documents are skipped entirely if they are missing one of the vectors — but my expected behavior is that they should still be searchable, and simply get a 0 contribution for the missing vector(s).

Minimal reproducible code

import weaviate
from weaviate.classes.config import Configure, Property, DataType
from weaviate.classes.tenants import Tenant
from weaviate.classes.query import TargetVectors, MetadataQuery

# 1. Connect
client = weaviate.connect_to_local(port=9203)

# 2. Create collection with 3 named vectors
if client.collections.exists("sample"):
    client.collections.delete("sample")
client.collections.create(
    name="sample",
    properties=[
        Property(name="text", data_type=DataType.TEXT),
        Property(name="info", data_type=DataType.TEXT),
        Property(name="desc", data_type=DataType.TEXT),
    ],
    vector_config=[
        Configure.Vectors.self_provided(name="vector1", vector_index_config=Configure.VectorIndex.hnsw()),
        Configure.Vectors.self_provided(name="vector2", vector_index_config=Configure.VectorIndex.hnsw()),
        Configure.Vectors.self_provided(name="vector3", vector_index_config=Configure.VectorIndex.hnsw()),
    ],
    multi_tenancy_config=Configure.multi_tenancy(enabled=True)
)

# 3. Add tenant
collection = client.collections.get("sample")
collection.tenants.create([Tenant(name="tenantA")])
tenant_collection = collection.with_tenant("tenantA")

# 4. Insert data with missing vectors in some docs
v = [0.2355] * 1024

tenant_collection.data.insert(
    properties={"text": "First text", "mvs": "mvs1"},
    vector={"vector1": v, "vector2": v}
)
tenant_collection.data.insert(
    properties={"text": "Second text", "mvs": "mvs2"},
    vector={"vector1": v, "vector2": v, "vector3": v}
)
tenant_collection.data.insert(
    properties={"text": "Third text", "mvs": "mvs3"},
    vector={"vector1": v}
)

# 5. Query across vector1 + vector2
response = tenant_collection.query.near_vector(
    near_vector={
        "vector1": v,
        "vector2": v
    },
    limit=20,
    target_vector=TargetVectors.manual_weights({
        "vector1": 30,
        "vector2": 30
    }),
    return_metadata=MetadataQuery(distance=True)
)

for o in response.objects:
    print(o.properties, o.metadata.distance)

Problem

  • Documents that don’t contain vector2 are not included in the results at all.

  • My expectation: such documents should still participate in the search, with their missing vector treated as a 0 score contribution (instead of being excluded).

Question

  • Is this the intended behavior?

  • If yes, is there a config option or roadmap feature to allow “graceful fallback” for missing vectors (treat as 0 contribution instead of excluding the doc)?


Would you like me to also suggest a workaround in the ticket?

Hey @nikhil1728,

I believe my colleague @Dirk has provided an answer in the Slack channel

  • Is this the intended behavior?

yes this is intended behaviour - the problem with simply using 0 if a vector is that then documents with very few named vectors would tend to be at the top

  • If yes, is there a config option or roadmap feature to allow “graceful fallback” for missing vectors (treat as 0 contribution instead of excluding the doc)?

Not yet - you could create an issue on our roadmap, but I cannot make any promises

Best regards,

Mohamed Shahin
Weaviate Support Engineer
(Ireland, UTC±00:00/+01:00)

1 Like