Docker with --rm needs empty PERSISTENCE_DATA_PATH

I found out, that if you use the same data-directory with a new created docker container, the client will not work anymore.

I use this Python Code for testing:

import weaviate

client = weaviate.connect_to_local(host="127.0.0.1", port=7776)
print(client.is_ready())
client.close()

With a new container and a new (empty) data-directory the result is: True

With the same and restarted container an the prev. used data-directory the result is: True

With a new container and a prev. used data-directory the result is: False

hi @ksrev !!

Welcome to our community :hugs:

Could you detail the step by step used for that outcome?

Also, is there any error logs from the server on the situation it is not working?

Thanks!

Hello @DudaNogueira,

there are no special steps required. What I’ve discovered is that when using persistent storage (either a Docker volume or a bound local directory), the database only works correctly with the original container it was initialized with.

It appears that some crucial data is stored outside the designated persistent data directory. As a result, simply reusing the same volume with a new container — even if it’s based on the same image version — leads to issues. The code I posted in my initial message will then return False.

To reproduce the issue, just delete the original container, spin up a new one with the same image version, and mount the previously used volume. The problem should reoccur.

I haven’t seen any error messages or hints indicating what exactly went wrong. But to be fair, I’m not an expert when it comes to digging into Docker containers.

I compared both containers and eventually noticed a difference. The second container — the one I created later and attached to the data directory of the first — shows recurring error messages in its Docker log.

{“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“level”:“info”,“msg”:“attempting to join”,“remoteNodes”:{“70cf1899cd62”:“172.17.0.4:8300”},“time”:“2025-03-31T18:28:23Z”}
{“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“level”:“info”,“msg”:“attempted to join and failed”,“remoteNode”:“172.17.0.4:8300”,“status”:8,“time”:“2025-03-31T18:28:23Z”}
{“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“level”:“info”,“msg”:“attempting to join”,“remoteNodes”:{“70cf1899cd62”:“172.17.0.4:8300”},“time”:“2025-03-31T18:28:24Z”}
{“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“level”:“info”,“msg”:“attempted to join and failed”,“remoteNode”:“172.17.0.4:8300”,“status”:8,“time”:“2025-03-31T18:28:24Z”}
…

Here are the docker command for the containers:

docker run
–name weaviate_a
-v /data/raid250/weaviate_data:/var/lib/weaviate
-e PERSISTENCE_DATA_PATH=/var/lib/weaviate
-p 7776:8080
-p 50051:50051
semitechnologies/weaviate:1.29.2

docker run
–name weaviate_b
-v /data/raid250/weaviate_data:/var/lib/weaviate
-e PERSISTENCE_DATA_PATH=/var/lib/weaviate
-p 7776:8080
-p 50051:50051
semitechnologies/weaviate:1.29.2

weaviate_a works well.
weaviate_b works not.

When you stop weaviate_b and start weaviate_a it works well again.

Hi!

That’s expected. There is a lock mechanism from Weaviate to take over and manage the data.

You can’t run two instances using the same persistence path.

Each node should have it’s own data path, and they will sync the data.

Here more info on replication in Weaviate:

Let me know if this helps.

Thanks!

Hello,

but finally I don’t want to run two instances at the same time. What if I setup a new Instance for update or some other reason. I don’t want to vectorize all my data again. I want to reuse my old datadir for sure.

greetings

If you want to migrate your data from a single cluster to a multi node cluster,
you will need to spin up the new cluster first, create the collection accordingly (making sure to to set the replication factor) and the migrate your data over.