Issue while inserting the image in weaviate

Hi, I’m trying to build a visual search application. I have tried JS its working fine. Haven’t faced any issues and easy to implement. But with python v4 facing below issue. I have converted the image to base64. using open clip vit model

    if not weaviate_client.collections.exists(settings.CLASS_NAME):
        weaviate_client.collections.create(
            WeaviateConfig.CLASS_NAME, 
            properties=[
                Property(name="sku", data_type=DataType.TEXT),
                Property(name="image", data_type=DataType.BLOB),
            ],
            vectorizer_config=
                Configure.Vectorizer.multi2vec_clip(
                    image_fields=[Multi2VecField(name="image", weight=0.9)],
                    text_fields=[Multi2VecField(name="sku", weight=0.1)]
                )
            ,
        )
def process_batch(source, client: WeaviateClient):
    collection = client.collections.get(WeaviateConfig.CLASS_NAME)
    with collection.batch.dynamic() as batch:
        for img_path in source:
            sku = os.path.basename(os.path.dirname(img_path))
            b64_image = toBase64(img_path)
            if b64_image:
                properties = {"sku": sku, "image": b64_image, "path": img_path}
                batch.add_object(
                    properties=properties,
                )
    print(collection.batch.results)
def toBase64(path):
    with open(path, 'rb') as file:
        return base64.b64encode(file.read()).decode('utf-8')

Error Message -

2024-07-04 22:09:29 ERROR: Something went wrong while vectorizing data.
2024-07-04 22:09:29 Traceback (most recent call last):
2024-07-04 22:09:29 File “/app/app.py”, line 54, in read_item
2024-07-04 22:09:29 result = await clip.vectorize(payload)
2024-07-04 22:09:29 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-07-04 22:09:29 File “/app/clip.py”, line 288, in vectorize
2024-07-04 22:09:29 return await asyncio.wrap_future(self.executor.submit(self.clip.vectorize, payload))
2024-07-04 22:09:29 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-07-04 22:09:29 File “/usr/local/lib/python3.11/concurrent/futures/thread.py”, line 58, in run
2024-07-04 22:09:29 result = self.fn(*self.args, **self.kwargs)
2024-07-04 22:09:29 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-07-04 22:09:29 File “/app/clip.py”, line 156, in vectorize
2024-07-04 22:09:29 image_files = [_parse_image(image) for image in payload.images]
2024-07-04 22:09:29 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-07-04 22:09:29 File “/app/clip.py”, line 156, in
2024-07-04 22:09:29 image_files = [_parse_image(image) for image in payload.images]
2024-07-04 22:09:29 ^^^^^^^^^^^^^^^^^^^
2024-07-04 22:09:29 File “/app/clip.py”, line 298, in _parse_image
2024-07-04 22:09:29 img = Image.open(io.BytesIO(image_bytes))
2024-07-04 22:09:29 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-07-04 22:09:29 File “/usr/local/lib/python3.11/site-packages/PIL/Image.py”, line 3339, in open
2024-07-04 22:09:29 raise UnidentifiedImageError(msg)
2024-07-04 22:09:29 PIL.UnidentifiedImageError: cannot identify image file <_io.BytesIO object at 0xffff6eef4bd0>

if there is any example on how to insert image would be more helpful.

hi @moaabid !

Welcome to our community :hugs:

We do have a recipe for clip!

For instance, this is what you’ll find on that notebook:

import base64

# Helper function to convert a file to base64 representation
def toBase64(path):
    with open(path, 'rb') as file:
        return base64.b64encode(file.read()).decode('utf-8')

# List of source images 
source = ["cat1.jpg", "cat2.jpg", "cat3.jpg",
          "dog1.jpg", "dog2.jpg", "dog3.jpg",
          "meerkat1.jpg", "meerkat2.jpg", "meerkat3.jpg"]


with animals.batch.dynamic() as batch:
    for name in source:
        print(f"Adding {name}")
        # Build the path to the image file
        path = "./source/image/" + name
        # Object to store in Weaviate
        properties = {
            "name": name,
            "path": path,
            "image": toBase64(path), # Weaviate will use the base64 representation of the file to generate a vector.
        }
        batch.add_object(
            properties=properties,
        )
print(animals.batch.results)

Let me know if this helps!

Thanks!

Thanks for sharing it.

1 Like