Hi, I’m trying to build a visual search application. I have tried JS its working fine. Haven’t faced any issues and easy to implement. But with python v4 facing below issue. I have converted the image to base64. using open clip vit model
if not weaviate_client.collections.exists(settings.CLASS_NAME):
weaviate_client.collections.create(
WeaviateConfig.CLASS_NAME,
properties=[
Property(name="sku", data_type=DataType.TEXT),
Property(name="image", data_type=DataType.BLOB),
],
vectorizer_config=
Configure.Vectorizer.multi2vec_clip(
image_fields=[Multi2VecField(name="image", weight=0.9)],
text_fields=[Multi2VecField(name="sku", weight=0.1)]
)
,
)
def process_batch(source, client: WeaviateClient):
collection = client.collections.get(WeaviateConfig.CLASS_NAME)
with collection.batch.dynamic() as batch:
for img_path in source:
sku = os.path.basename(os.path.dirname(img_path))
b64_image = toBase64(img_path)
if b64_image:
properties = {"sku": sku, "image": b64_image, "path": img_path}
batch.add_object(
properties=properties,
)
print(collection.batch.results)
def toBase64(path):
with open(path, 'rb') as file:
return base64.b64encode(file.read()).decode('utf-8')
Error Message -
2024-07-04 22:09:29 ERROR: Something went wrong while vectorizing data.
2024-07-04 22:09:29 Traceback (most recent call last):
2024-07-04 22:09:29 File “/app/app.py”, line 54, in read_item
2024-07-04 22:09:29 result = await clip.vectorize(payload)
2024-07-04 22:09:29 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-07-04 22:09:29 File “/app/clip.py”, line 288, in vectorize
2024-07-04 22:09:29 return await asyncio.wrap_future(self.executor.submit(self.clip.vectorize, payload))
2024-07-04 22:09:29 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-07-04 22:09:29 File “/usr/local/lib/python3.11/concurrent/futures/thread.py”, line 58, in run
2024-07-04 22:09:29 result = self.fn(*self.args, **self.kwargs)
2024-07-04 22:09:29 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-07-04 22:09:29 File “/app/clip.py”, line 156, in vectorize
2024-07-04 22:09:29 image_files = [_parse_image(image) for image in payload.images]
2024-07-04 22:09:29 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-07-04 22:09:29 File “/app/clip.py”, line 156, in
2024-07-04 22:09:29 image_files = [_parse_image(image) for image in payload.images]
2024-07-04 22:09:29 ^^^^^^^^^^^^^^^^^^^
2024-07-04 22:09:29 File “/app/clip.py”, line 298, in _parse_image
2024-07-04 22:09:29 img = Image.open(io.BytesIO(image_bytes))
2024-07-04 22:09:29 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-07-04 22:09:29 File “/usr/local/lib/python3.11/site-packages/PIL/Image.py”, line 3339, in open
2024-07-04 22:09:29 raise UnidentifiedImageError(msg)
2024-07-04 22:09:29 PIL.UnidentifiedImageError: cannot identify image file <_io.BytesIO object at 0xffff6eef4bd0>
if there is any example on how to insert image would be more helpful.