[Question] after Migrate data, something wrong, data lose

with collection.batch.fixed_size(batch_size=200) as batch:
    for data_row in data_rows:
        batch.add_object(
            properties=data_row,
        )

Hi, I’m from SellerSprite, currently I will build an app for amazon image searching. Thanks for your best open-source vector database. But something is wrong after migrating data (from a standalone to another cluster).

Over 11M objects, the standalone one indicate that those objects has 500G storage. but copied, the cluster only 10% disk usage.

I compared two total items:

.aggregate.over_all(total_count=True).total_count

these are same items. If execute docker commands, docker compose down; docker compose up -d, the cluster will show 3M objects.

Ok, where the rest 8M objects? Memory? How to fix this?

hi @Jack_Dim !!

Welcome to our community :hugs:

Have you check our migration guide?

You should keep the id of the objects from source to target cluster.

That way you can retry the ingestion, and if the object is already ingested, it will be kept, due to the deterministic id

Let me know if this helps!

Ok, thank you.
Most time, the data will be lost when restart docker. sometimes when finishing copy, waiting for a long time, restart docker, the data will not be lost.

So, it’s Weird.

I found the problem.

    volumes:
      - /db-500g/weaviate-node-3:/var/lib/weaviate
    restart: on-failure:0
    environment:
      QUERY_DEFAULTS_LIMIT: 500
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'

THE docker insert files is not sync to /db-500g/weaviate-node-3, so the files’ size is not same.

So, how to force it to sync ?

hi !

Not sure I understood the question.

You mean there is a difference in size between your nodes mount point for Weaviate data?

If that’s the case, it can happen due to some factors:
multi tenancy: You may have bigger tenants on different nodes.
out of sync collections: Due to how eventual consistency works, your nodes may be out of sync. You can enable async repairs, and they will sync data between nodes:

Let me know if this is the case here.

Thanks!