Batch import does not import full dataframe

Description

Hello all,

I am using python (v4) and want to upload a dataframe storing 5000 rows to a collection in my weaviate db (free trial). I don’t get any errors when doing so. But when I run this code to count the uploaded objects I only get 3716 objects:


collection = client.collections.get("CongressionalRecord_Test")
response = collection.aggregate.over_all(total_count=True)

print(response.total_count)

This is my code to upload my df:

year = 'test'

df = pd.read_excel('speeches_comp_' + year + '.xlsx')

person = df.iloc[:,0]
speeches = df.iloc[:,1]
files = df.iloc[:,2]
keywords = df.iloc[:,3]

client.collections.create(
    "CongressionalRecord_Test",
    vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_openai(),
    properties=[
        Property(name="speech", data_type=DataType.TEXT),
        Property(name="file", data_type=DataType.TEXT),
        Property(name="person", data_type=DataType.TEXT),
        Property(name="keywords", data_type=DataType.TEXT),
    ]
)


collection = client.collections.get("CongressionalRecord_Test")

with collection.batch.dynamic() as batch:
    for i, (speech_name, file_name, person_name, keywords_name) in enumerate(zip(speeches, files, person, keywords)):
        print(i)
        properties = {
            "speech": speech_name,
            "file": file_name,
            "person": person_name,
            "keywords": keywords_name
        }
        batch.add_object(properties=properties,)

Am I doing something wrong? Why isn’t the full df uploaded?

Thanks in advance and kind regards

Hi @mfieldhouse !

Welcome to our community :hugs:

You will need to do some error handling in order to identify those errors.

Here is how:

import weaviate
import weaviate.classes as wvc

client = weaviate.connect_to_local()

try:
    # ===== First batch import block =====
    with client.batch.rate_limit(requests_per_minute=600) as batch:  # or <collection>.batch.rate_limit()
        pass  # Batch import objects/references
    failed_objs_a = client.batch.failed_objects  # Get failed objects from the first batch import
    failed_refs_a = client.batch.failed_references  # Get failed references from the first batch import

finally:
    client.close()

Also, check Weaviate logs for some errors.

Let me know if this helps :slight_smile: