Description
I am using the img2vec-neural
vectorizer to embed images in my collection. As stated in the docs, I have defined a property with the Blob datatype to hold the base64 representation of the image and referenced that property in the image_fields
argument.
return self.client.collections.create(
name=IMAGES_COLLECTION_NAME,
vectorizer_config=wvc.Configure.Vectorizer.img2vec_neural(image_fields=["image"]),
vector_index_config=wvc.Configure.VectorIndex.hnsw(distance_metric=wvc.VectorDistance.COSINE),
properties=[
wvc.Property(name="name", data_type=wvc.DataType.TEXT, skip_vectorization=True),
wvc.Property(name="image", data_type=wvc.DataType.BLOB),
],
)
I then use the insert
method to create and add objects to my collection.
image_properties = {
"text": image.text,
"image": image.image_b64,
}
image_uuid = generate_uuid5(image_properties)
self._images.data.insert(properties=image_properties, uuid=image_uuid)
Finally, when performing a similarity search, I get accurate results in the sense that similar image objects are indeed found however, I cannot return the image property.
reference_images = self._images.query.near_image(
near_image=image.image_b64,
limit=3,
return_metadata=wvc.MetadataQuery(distance=True),
).objects
Without return_properties
defined, it should return all properties but it instead only returns name
. Even when I do define it
return_properties=["name", "image"]
it just comes back as None
.
So I’m wondering if there is a limitation given how blob types are stored that prevents them from being served in queries as I also tried duplicating the image
property, just calling it image2
for simplicity, stored it as a blob but chose to skip vectorization and it too could not be returned in a query. So it seems to be a limitation on the blob type as opposed to an issue with blob types which are vectorized.
Essentially I assumed this would behave as text2vec-transformers
does and when I query the collection it returns the property, say text
, that was vectorized. The docs don’t seem to provide a direct example demonstrating this as possible but the definition of the blob type states When serving, the data is base64 encoded (so it is safe to serve as json)
which seems to imply that I should be able to access the data somehow.
Any help or insight would be appreciated. Thanks!
Server Setup Information
- Weaviate Server Version: 1.23.2
- Deployment Method: Docker
- Multi Node? Number of Running Nodes: N/A
- Client Language and Version: Python v4
Any additional Information
This Weaviate article seems to sidestep this issue by including a filepath to the image, which has been my current solution up until this point, but for my use case I ultimately would just want the images stored and accessible through Weaviate.