Description
I have the following code for creating a collection.
Create of the collection is successful
Populating the collection gives error.
def create_weaviate_collection(client: weaviate.Client, collection_name: str):
“”"
Creates a Weaviate collection with the specified schema using the v4 client.
Args:
client: Weaviate client.
collection_name: Name of the collection to create.
"""
try:
# Create the collection without vectorizer config initially
collection = client.collections.create(
name=collection_name,
description="Collection of ophthalmology articles",
)
# Now, update the vectorizer configuration
collection.config.update(
vector_config={
"text2vec-transformers": {
"modelName": EMBEDDING_MODEL,
"vectorizeClassName": False,
"poolingMode": "mean",
}
}
)
# Add properties
collection.config.add_property(
name="source",
data_type=weaviate.DataType.TEXT,
description="Source of the article",
skip_vectorization=True,
)
collection.config.add_property(
name="title",
data_type=weaviate.DataType.TEXT,
description="Title of the article",
skip_vectorization=True,
)
collection.config.add_property(
name="authors",
data_type=weaviate.DataType.TEXT,
description="Authors of the article",
skip_vectorization=True,
)
collection.config.add_property(
name="affiliations",
data_type=weaviate.DataType.TEXT,
description="Affiliations of the authors",
skip_vectorization=True,
)
collection.config.add_property(
name="content",
data_type=weaviate.DataType.TEXT,
description="Content of the article",
skip_vectorization=False,
)
print(f"Collection '{collection_name}' created successfully.")
except weaviate.exceptions.WeaviateBaseError as e:
print(f"Error creating collection: {e}")
and the following code to populate the collection
def populate_weaviate(client: weaviate.Client, collection_name: str, data: List[Dict]):
“”"
Populates a Weaviate collection with data using the v4 client (client batching).
Args:
client: Weaviate client.
collection_name: Name of the collection to populate.
data: List of dictionaries, where each dictionary represents an object.
"""
try:
batch_size = 10
with client.batch.fixed_size(batch_size=batch_size) as batch:
for i, obj_data in enumerate(data):
try:
batch.add_object(
properties=obj_data,
collection=collection_name,
)
if (i + 1) % batch_size == 0:
print(f"Indexed {i + 1} objects")
except weaviate.exceptions.WeaviateBaseError as e:
print(f"Error adding object: {e}")
print(f"Data that caused the error: {obj_data}")
client.close()
# continue
failed_objects = client.batch.failed_objects
if failed_objects:
print(f"Number of failed imports: {len(failed_objects)}")
print(f"First failed object: {failed_objects[0]}")
print("Finished indexing all objects.")
client.close()
except weaviate.exceptions.WeaviateBaseError as e:
print(f"Error getting or batching collection: {e}")
I get the following error:
{‘message’: ‘Failed to send all objects in a batch of 1’, ‘error’: “WeaviateInsertManyAllFailedError(‘Every object failed during insertion. Here is the set of all errors: unmarshal error response body: Not Found’)”}
{‘message’: ‘Failed to send 1 objects in a batch of 1. Please inspect client.batch.failed_objects or collection.batch.failed_objects for the failed objects.’}
Number of failed imports: 1
First failed object: ErrorObject(message=“WeaviateInsertManyAllFailedError(‘Every object failed during insertion. Here is the set of all errors: unmarshal error response body: Not Found’)”, object_=BatchObject(collection=‘Oogheelkunde_Collection’, properties={‘source’: ‘Van de Industrie\nLancet’, ‘title’: ‘Some interesting subject’, ‘authors’: ‘Paolo Lanzetta’, ‘affiliations’: ‘collapse\n\n1Department of Medicine-Ophthalmology’, ‘content’: ‘Background:\xa0blablabla\nMethods:\xa0just choose one\nFindings:\xa0 a black hole\nInterpretation:\xa0insert failed.’}, references=None, uuid=‘049245e0-280e-4c00-9514-78fc39a9c8cc’, vector=None, tenant=None, index=0, retry_count=0), original_uuid=None)
I see that properties: source,title, authors, affiliations and content are correctly in the list.
Server Setup Information
weaviate:
command:
- --host
- 0.0.0.0
- --port
- ‘8080’
- --scheme
- http
image: cr.weaviate.io/semitechnologies/weaviate:1.25.4
ports:
- 8080:8080
- 50051:50051
volumes:
- weaviate_data:/var/lib/weaviate
restart: on-failure:0
environment:
QUERY_DEFAULTS_LIMIT: 25
AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: ‘true’
HUGGINGFACE_APIKEY: ‘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’
PERSISTENCE_DATA_PATH: ‘/var/lib/weaviate’
DEFAULT_VECTORIZER_MODULE: ‘text2vec-huggingface’
ENABLE_MODULES: ‘text2vec-huggingface’
CLUSTER_HOSTNAME: ‘node1’
volumes:
weaviate_data:
- Weaviate Server Version:
- Deployment Method: docker
- Multi Node? 1
- Client Language and Version:
- Multitenancy?: NO
- python: 3.11.9