Have this very simple python code:
def main():
client = init_weaviate_client()
try:
collection = client.collections.get(os.getenv("COP_COPERTINE_COLLNAME"))
extractor = ManifestoGPTExtractor()
for obj in collection.iterator():
extractor.process_object(collection, obj)
finally:
client.close()
and I have a breakpoint on the process_object line and inspecting the obj variable I see:
where you can see we do NOT have a “editionImageStr” property which should hold a 250-300K Base64 image.
I am positive that ALL objects in the collection DO HAVE that property as you can see from just a snapshot of one of them through Postman on the same collection:
cannot find any details on why iterate() is apparently “losing” that BLOB property. I found a post by @sebawita speaking of a search function:
The Python client v4, doesn’t return blob properties by default, as these are potentially large values.
However, if you run a query withreturn_properties
, I would expect values for the selected properties to be present.
but I cannot find a way for this to work with the iterator(). May this “bug” be related?
Here is the collection definition:
COPERTINE_COLL_CONFIG = {
"class": COP_COPERTINE_COLLNAME,
"description": "Collection of Il Manifesto newspaper covers",
"vectorizer": "none",
"properties": [
Property(
name="editionId",
data_type=DataType.TEXT,
description="Unique identifier for the edition",
tokenization="field",
index_searchable=False
),
Property(
name="editionDateIsoStr",
data_type=DataType.DATE,
description="Publication date of the edition"
),
Property(
name="editionImageStr",
data_type=DataType.BLOB,
description="Base64 encoded image string"
),
Property(
name="captionAIStr",
data_type=DataType.TEXT,
description="Image caption as recognized by the AI model"
),
Property(
name="imageAIDeStr",
data_type=DataType.TEXT,
description="Image description as generated by the AI model"
),
Property(
name="modelAIName",
data_type=DataType.TEXT,
description="AI model name",
tokenization="field",
index_searchable=False
),
]
}
I do need to iterate on this collection and process the images attached to each object in that property. Thanks
Server Setup Information
- Weaviate Server Version: 1.25.4
- Deployment Method: docker
- Multi Node? No
- Client Language and Version: Python 4.10.2
- Multitenancy?: No