Description
I have the following code for creating a collection.
Create of the collection is successful
Populating the collection gives error.
def create_weaviate_collection(client: weaviate.Client, collection_name: str):
“”"
Creates a Weaviate collection with the specified schema using the v4 client.
Args:
    client: Weaviate client.
    collection_name: Name of the collection to create.
"""
try:
    # Create the collection without vectorizer config initially
    collection = client.collections.create(
        name=collection_name,
        description="Collection of ophthalmology articles",
    )
    # Now, update the vectorizer configuration
    collection.config.update(
        vector_config={
            "text2vec-transformers": {
                "modelName": EMBEDDING_MODEL,
                "vectorizeClassName": False,
                "poolingMode": "mean",
            }
        }
    )
    # Add properties
    collection.config.add_property(
        name="source",
        data_type=weaviate.DataType.TEXT,
        description="Source of the article",
        skip_vectorization=True,
    )
    collection.config.add_property(
        name="title",
        data_type=weaviate.DataType.TEXT,
        description="Title of the article",
        skip_vectorization=True,
    )
    collection.config.add_property(
        name="authors",
        data_type=weaviate.DataType.TEXT,
        description="Authors of the article",
        skip_vectorization=True,
    )
    collection.config.add_property(
        name="affiliations",
        data_type=weaviate.DataType.TEXT,
        description="Affiliations of the authors",
        skip_vectorization=True,
    )
    collection.config.add_property(
        name="content",
        data_type=weaviate.DataType.TEXT,
        description="Content of the article",
        skip_vectorization=False,
    )
    print(f"Collection '{collection_name}' created successfully.")
except weaviate.exceptions.WeaviateBaseError as e:
    print(f"Error creating collection: {e}")
and the following code to populate the collection
def populate_weaviate(client: weaviate.Client, collection_name: str, data: List[Dict]):
“”"
Populates a Weaviate collection with data using the v4 client (client batching).
Args:
    client: Weaviate client.
    collection_name: Name of the collection to populate.
    data: List of dictionaries, where each dictionary represents an object.
"""
try:
    batch_size = 10
    with client.batch.fixed_size(batch_size=batch_size) as batch:
        for i, obj_data in enumerate(data):
            try:
                batch.add_object(
                    properties=obj_data,
                    collection=collection_name,
                )
                if (i + 1) % batch_size == 0:
                    print(f"Indexed {i + 1} objects")
            except weaviate.exceptions.WeaviateBaseError as e:
                print(f"Error adding object: {e}")
                print(f"Data that caused the error: {obj_data}")
                client.close()
                # continue
    failed_objects = client.batch.failed_objects
    if failed_objects:
        print(f"Number of failed imports: {len(failed_objects)}")
        print(f"First failed object: {failed_objects[0]}")
        print("Finished indexing all objects.")
    client.close()
except weaviate.exceptions.WeaviateBaseError as e:
    print(f"Error getting or batching collection: {e}")
I get the following error:
{‘message’: ‘Failed to send all objects in a batch of 1’, ‘error’: “WeaviateInsertManyAllFailedError(‘Every object failed during insertion. Here is the set of all errors: unmarshal error response body: Not Found’)”}
{‘message’: ‘Failed to send 1 objects in a batch of 1. Please inspect client.batch.failed_objects or collection.batch.failed_objects for the failed objects.’}
Number of failed imports: 1
First failed object: ErrorObject(message=“WeaviateInsertManyAllFailedError(‘Every object failed during insertion. Here is the set of all errors: unmarshal error response body: Not Found’)”, object_=BatchObject(collection=‘Oogheelkunde_Collection’, properties={‘source’: ‘Van de Industrie\nLancet’, ‘title’: ‘Some interesting subject’, ‘authors’: ‘Paolo Lanzetta’, ‘affiliations’: ‘collapse\n\n1Department of Medicine-Ophthalmology’, ‘content’: ‘Background:\xa0blablabla\nMethods:\xa0just choose one\nFindings:\xa0 a black hole\nInterpretation:\xa0insert failed.’}, references=None, uuid=‘049245e0-280e-4c00-9514-78fc39a9c8cc’, vector=None, tenant=None, index=0, retry_count=0), original_uuid=None)
I see that properties: source,title, authors, affiliations and content are correctly in the list.
Server Setup Information
weaviate:
command:
- --host
- 0.0.0.0
- --port
- ‘8080’
- --scheme
- http
image: cr.weaviate.io/semitechnologies/weaviate:1.25.4
ports:
- 8080:8080
- 50051:50051
volumes:
- weaviate_data:/var/lib/weaviate
restart: on-failure:0
environment:
QUERY_DEFAULTS_LIMIT: 25
AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: ‘true’
HUGGINGFACE_APIKEY: ‘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’
PERSISTENCE_DATA_PATH: ‘/var/lib/weaviate’
DEFAULT_VECTORIZER_MODULE: ‘text2vec-huggingface’
ENABLE_MODULES: ‘text2vec-huggingface’
CLUSTER_HOSTNAME: ‘node1’
volumes:
weaviate_data:
- Weaviate Server Version:
- Deployment Method: docker
- Multi Node? 1
- Client Language and Version:
- Multitenancy?: NO
- python: 3.11.9
