Unable to return object property when datatype is Blob

Description

I am using the img2vec-neural vectorizer to embed images in my collection. As stated in the docs, I have defined a property with the Blob datatype to hold the base64 representation of the image and referenced that property in the image_fields argument.

return self.client.collections.create(
    name=IMAGES_COLLECTION_NAME,
    vectorizer_config=wvc.Configure.Vectorizer.img2vec_neural(image_fields=["image"]),
    vector_index_config=wvc.Configure.VectorIndex.hnsw(distance_metric=wvc.VectorDistance.COSINE),
    properties=[
        wvc.Property(name="name", data_type=wvc.DataType.TEXT, skip_vectorization=True),
        wvc.Property(name="image", data_type=wvc.DataType.BLOB),
    ],
)

I then use the insert method to create and add objects to my collection.

image_properties = {
    "text": image.text,
    "image": image.image_b64,
}
image_uuid = generate_uuid5(image_properties)
self._images.data.insert(properties=image_properties, uuid=image_uuid)

Finally, when performing a similarity search, I get accurate results in the sense that similar image objects are indeed found however, I cannot return the image property.

reference_images = self._images.query.near_image(
    near_image=image.image_b64,
    limit=3,
    return_metadata=wvc.MetadataQuery(distance=True),
).objects

Without return_properties defined, it should return all properties but it instead only returns name. Even when I do define it

return_properties=["name", "image"]

it just comes back as None.

So I’m wondering if there is a limitation given how blob types are stored that prevents them from being served in queries as I also tried duplicating the image property, just calling it image2 for simplicity, stored it as a blob but chose to skip vectorization and it too could not be returned in a query. So it seems to be a limitation on the blob type as opposed to an issue with blob types which are vectorized.

Essentially I assumed this would behave as text2vec-transformers does and when I query the collection it returns the property, say text, that was vectorized. The docs don’t seem to provide a direct example demonstrating this as possible but the definition of the blob type states When serving, the data is base64 encoded (so it is safe to serve as json) which seems to imply that I should be able to access the data somehow.

Any help or insight would be appreciated. Thanks!

Server Setup Information

  • Weaviate Server Version: 1.23.2
  • Deployment Method: Docker
  • Multi Node? Number of Running Nodes: N/A
  • Client Language and Version: Python v4

Any additional Information

This Weaviate article seems to sidestep this issue by including a filepath to the image, which has been my current solution up until this point, but for my use case I ultimately would just want the images stored and accessible through Weaviate.

Hi @birdboy ! Welcome to our community :hugs:

You shouldn’t be using Server 1.23.2 with python client 4.4.4

here we have a compatibility matrix:

With that said, can you confirm this issue will happen with latest 1.23.9 version of the Weaviate server?

Let me know if this is the case, then I can try replicating this.

But probably, a newer version of the server will fix this.

Thanks!

Hi @birdboy,

The Python client v4, doesn’t return blob properties by default, as these are potentially large values.
However, if you run a query with return_properties, I would expect values for the selected properties to be present.

Is this how you run your query?

reference_images = self._images.query.near_image(
    near_image=image.image_b64,
    limit=3,
    return_properties=["name", "image"],
    return_metadata=wvc.MetadataQuery(distance=True),
).objects

Also, did you try upgrading Weaviate to 1.23.9 or later?

Thanks for the replies! I upgraded to 1.23.9 however, it unfortunately did not change the current behavior.

@sebawita, that is good to know regarding blobs not being returned by default but yes, that is how I am running my query (with return_properties explicitly included).

Also, when checking the types of the properties when they are returned, I see name as <class 'str'> which is expected, but image comes back as <class 'v1.properties_pb2.Value'>. Is this expected and I’m just not parsing the results properly, or is this more so an indication that the property is not recognized?

Thanks again!

Hi @birdboy !

I was able to get the blob returned. He is the code I used based on yours:

import weaviate
from weaviate import classes as wvc
client = weaviate.connect_to_local()

client.collections.delete("Teste")
c = client.collections.create(
    "Teste",
    vectorizer_config=wvc.config.Configure.Vectorizer.img2vec_neural(image_fields=["image"]),
    vector_index_config=wvc.config.Configure.VectorIndex.hnsw(distance_metric=wvc.config.VectorDistances.COSINE),
    properties=[
        wvc.config.Property(name="name", data_type=wvc.config.DataType.TEXT, skip_vectorization=True),
        wvc.config.Property(name="image", data_type=wvc.config.DataType.BLOB),
    ],
)

c.config.get().properties

b64 = "iVBORw0KGgoAAAANSUh......"


b64_2 = "iVBOR......"


from weaviate.util import generate_uuid5
image_properties = {
    "name": "Some text",
    "image": b64,
}
image_uuid = generate_uuid5(image_properties)
c.data.insert(properties=image_properties, uuid=image_uuid)


c.query.fetch_objects(return_properties=["name", "image"], include_vector=True).objects[0].properties

# {'image': 'iVBORw0KGgoAAAANSU.......',
# 'name': 'Some text'}

reference_images = c.query.near_image(
    near_image=b64_2,
    limit=3,
    return_properties=["name", "image"],
    return_metadata=wvc.query.MetadataQuery(distance=True),
    include_vector=True,
).objects

print(reference_images[0].properties)

Note: some imports have changed. I adapted that accordingly in the code.

Let me know if that helps :slight_smile:

1 Like

Hi @DudaNogueira ,

So I tried running the code you provided and still am unable to get the image data to return.

Here’s what I see

{'image': , 'name': 'Some text'}

Here is my docker-compose file in case there is some issue with it, though it is just the file generated by the configurator tool

---
version: '3.4'
services:
  weaviate:
    command:
    - --host
    - 0.0.0.0
    - --port
    - '8080'
    - --scheme
    - http
    image: semitechnologies/weaviate:1.23.9
    ports:
    - 8080:8080
    - 50051:50051
    volumes:
    - weaviate_data:/var/lib/weaviate
    restart: on-failure:0
    environment:
      IMAGE_INFERENCE_API: 'http://i2v-neural:8080'
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
      DEFAULT_VECTORIZER_MODULE: 'img2vec-neural'
      ENABLE_MODULES: 'img2vec-neural'
      CLUSTER_HOSTNAME: 'node1'
  i2v-neural:
    image: semitechnologies/img2vec-pytorch:resnet50
    environment:
      ENABLE_CUDA: '0'
volumes:
  weaviate_data:
...

Oh and probably made clear given the docker-compose file but I am running CPU-only as I’m on a M2 Macbook (I’m doubtful this would have an impact on this particular issue however)

UPDATE:

Thanks again for the help troubleshooting this. Oddly, I was able to get it to work by not specifying the image property in the schema. I still define it here

vectorizer_config=wvc.config.Configure.Vectorizer.img2vec_neural(image_fields=["image"])

but not in the properties list

wvc.config.Property(name="image", data_type=wvc.config.DataType.BLOB)

After some investigating it seems to be working because auto-schema is inferring that the image property is of type Text. I have tried manually setting the datatype to Text and it works as well. Interestingly, it does not seem to change my image querying results but regardless, the docs explicitly advise against this

dataType - the data type of the property. For use in imageFields, must be set to blob.

So while this is an interesting development, it is not my desired work-around.