Attribute missing when querying near vector

Description

Hi everyone,
I’m getting started with Weaviate, you’ll have to excuse any (likely) obvious errors on my part:

I’ve followed the guide on setting up a collection with my own vectors – A collection consisting of two properties (name and image) and the corresponding CLIP embeddings of both properties that I prepared ahead (as multimodal vectorizers are not available on WCS).

client.collections.create(
    name="EmojiDB",
    properties=[
        wc.Property(name="name", data_type=wc.DataType.TEXT),
        wc.Property(name="image", data_type=wc.DataType.BLOB),
    ],
    vectorizer_config=wc.Configure.Vectorizer.none(),
    generative_config=wc.Configure.Generative.openai()
)

Populating the database seemed to work properly, using both the corresponding string for text and base-64 encoded string for image… Except for an error showing repeatedly:

ERROR:asyncio:Exception in callback PollerCompletionQueue._handle_events(<_UnixSelecto...e debug=False>)()
handle: <Handle PollerCompletionQueue._handle_events(<_UnixSelecto...e debug=False>)()>
Traceback (most recent call last):
...

I’m using Google Colab for testing, so I assumed this might be more of a compute-related warning.

I went ahead and performed a vector search with query.near_vector and would manage to retrieve somewhat similar objects’ name as the one I prompted (Using the same CLIP vectoriser as the embeddings I provided).

Yet, I’m unable to access the image attribute, which is nowhere to be seen on the QueryReturn response object:

QueryReturn(objects=[Object(uuid=_WeaviateUUIDInt('ddfbc272-c9fe-4a61-a0fb-061e958a8fbb'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=0.7795770168304443, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'name': 'white down pointing backhand index'}, references=None, vector={}, collection='EmojiDB'), Object(uuid=_WeaviateUUIDInt('61a18bec-b56f-46f0-8c43-57e3fc8c89fa'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=0.7795770168304443, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'name': 'white down pointing backhand index'}, references=None, vector={}, collection='EmojiDB'), Object(uuid=_WeaviateUUIDInt('9e4a3158-7048-4a7b-8339-3979621b1add'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=0.7795770168304443, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'name': 'white down pointing backhand index'}, references=None, vector={}, collection='EmojiDB'), Object(uuid=_WeaviateUUIDInt('949dbd44-aa89-47fa-8157-f85ce778463d'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=0.7795770168304443, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'name': 'white down pointing backhand index'}, references=None, vector={}, collection='EmojiDB'), Object(uuid=_WeaviateUUIDInt('f68156cc-6300-4724-8869-1e352325f94a'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=0.7863796949386597, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'name': 'sign of the horns'}, references=None, vector={}, collection='EmojiDB')])

The vector attribute is also empty, which wouldn’t seem to be the expected behavior.

I went on the WCS Query App to try to get a look at the database, and it would actually output a record containing the encoded image attribute (Although it only fetches one single record…)

{  Get {  EmojiDB { name image  } } }

I’m a bit confused as to why these discrepancies are occurring. Has the raised error something to do with the image not loading correctly?

Perhaps the dataset might be somewhat bulky ? A total of 2749 30x30 images.

Below is what the populating script looks like:

emojiDB = client.collections.get("EmojiDB")

try:
  with emojiDB.batch.dynamic() as batch:
    for index, rows in df.iterrows():
        img_path = f"emojisFolder/{index}.png"
        with open(img_path, "rb") as file:
          poster_b64 = base64.b64encode(file.read()).decode("utf-8")

        collection_object = {
          "name": df.iloc[index, 1],
          "image": poster_b64 }
        vector = emojis_embeddings[index]

        batch.add_object(
            properties=collection_object,
            vector=vector
        )

  if len(emojiDB.batch.failed_objects) > 0:
      print(f"Failed to import {len(emojiDB.batch.failed_objects)} objects")


finally:
  failed_objs_a = client.batch.failed_objects
  failed_refs_a = client.batch.failed_references
  client.close()

Hopefully this is enough information for someone to spot my error(s) :crossed_fingers:t3:

Server Setup Information

  • Weaviate Server Version: Weaviate Cloud Services
  • Deployment Method: Cloud WCS Instance
  • Multi Node? Number of Running Nodes: 1
  • Client Language and Version: weaviate_client-4.5.7-py3-none-any

Hi! Do you want to get the image returned?

Can you share the google colab? That would help me reproducing this scenario.

Thanks!

Hi Duda,
Thank you for the help. Indeed, the idea was getting the image back (along with the name).
Colab Notebook I’ve cleaned it a bit and removed confidential info, and here a sample (Only 50 rows) of the dataset.

Hi @Javi,

The Python Client excludes blobs from the default response, as blobs are usually huge.

But there is a quick solution to this, you can list the properties you want back with return_properties=["name", "image"], like this:

# Perform query
response = movies.query.near_vector(
    near_vector=query_vector,  # A list of floating point numbers
    limit=5,
    return_metadata=wq.MetadataQuery(distance=True),
    return_properties=["name", "image"]
)

Recommendation

As a side note:
For apps dealing with images, I recommend storing the image path, so that instead of retrieving the image from the database, you can get it straight from the path.

This makes your queries faster (as you don’t need to send big images as part of your query result) and also saves you time decoding the base64 back to the image. So, that is a win-win.

1 Like

Hi @sebawite,
Thank you so much! I’ve been away for the whole day… Will test this as soon as I get back, but seems to be the solution. Good learning, also about the image path.

1 Like

Sure thing @Javi
Enjoy your Weaviate journey :slight_smile: