I want to retain my Weaviate data even If I restart my docker container

Hi!
I am using weaviate via docker container
Here is my docker code

netnanny-weaviate:
    container_name: netnanny-weaviate
    image: semitechnologies/weaviate:latest
    ports:
      - "8080:8080"
      - "50051:50051"
    environment:
      QUERY_DEFAULTS_LIMIT: '10'
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: /var/lib/weaviate
    volumes:
      - ./weaviate_data:/var/lib/weaviate
    networks:
      - netnanny 

If I keep

 PERSISTENCE_DATA_PATH: './data'

and

volumes:
      - weaviate_data:/var/lib/weaviate

I am losing data stored in “weaviate_data” folder If I restart my docker container.

And If I keep

PERSISTENCE_DATA_PATH: '/var/lib/weaviate'

And

volumes:
      - weaviate_data:/var/lib/weaviate

I am getting this error

{“build_git_commit”:“258edad”,“build_go_version”:“go1.22.11”,“build_image_tag”:“1.25.30”,“build_wv_version”:“1.25.30”,“level”:“info”,“msg”:“attempting to join”,“remoteNodes”:[“172.27.0.2:8300”],“time”:“2025-02-20T07:02:09Z”}
{“build_git_commit”:“258edad”,“build_go_version”:“go1.22.11”,“build_image_tag”:“1.25.30”,“build_wv_version”:“1.25.30”,“level”:“info”,“msg”:“attempted to join and failed”,“remoteNode”:“172.27.0.2:8300”,“status”:8,“time”:“2025-02-20T07:02:09Z”} .

Could you please let me know what I am doing wrong here. My only aim is I don’t want to lose the data If I restart my docker container.
Thank you

hi @Jotheraj_kori !!

Welcome to our community :hugs:

PERSISTENCE_DATA_PATH will tell Weaviate where the data will be stored.

As you are running Docker, you need to make sure that this folder will be mounted as volume.

this docker compose:

services:
  netnanny-weaviate:
      container_name: netnanny-weaviate
      image: semitechnologies/weaviate:latest
      ports:
        - "8080:8080"
        - "50051:50051"
      environment:
        QUERY_DEFAULTS_LIMIT: '10'
        AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
        PERSISTENCE_DATA_PATH: /var/lib/weaviate
      volumes:
        - ./weaviate_data:/var/lib/weaviate

have worked for me :thinking:

Notice that Weaviate content will not be stored at a folder called weaviate_data that will be side by side with the docker compose.yaml

Let me know if that helps!

THanks!

1 Like

Hi @DudaNogueira

Thank you for response
I have modified the code as per your suggestion.
Here is my code

netnanny-weaviate:
    container_name: netnanny-weaviate
    image: semitechnologies/weaviate:latest
    ports:
      - "8080:8080"
      - "50051:50051"
    environment:
      QUERY_DEFAULTS_LIMIT: '20' 
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
    volumes:
      - ./weaviate_data:/var/lib/weaviate 
    networks:
      - netnanny 

But Still I am facing some issue .
when I checked the logs of container this is what I am getting “attempted to join and failed”

Thank you!

Oh! I see. That’s only an info message.

Because this cluster is a single node, you can ignore it :slight_smile:

With this docker-compose, were you able to retain the data?

Hi @DudaNogueira Sorry , I was busy with some other tasks so couldn’t reply early.
As I am getting this warning “attempted to join and failed” and at this time when I hit the API to save the data in weaviate, I am unable to do it.
I am getting below error.

And When I am not getting that warning “attempted to join and failed” and If I hit the API to save data in weaviate, It Works.

And Here is my code to load the data in weaviate.

try:
        weaviate_client = weaviate.connect_to_custom(
            http_host=os.getenv('WEAVIATE_SERVER'),
            http_port=os.getenv('WEAVIATE_PORT'),
            http_secure=False,
            grpc_host=os.getenv('WEAVIATE_SERVER'),
            grpc_port=os.getenv('WEAVIATE_GRPC_PORT'),
            grpc_secure=False)

        collection_name = "NetworkAnomaly"

        class_schema = {
            "class": collection_name,
            "properties": [
                {"name": "anomaly_name", "dataType": ["text"]},
                {"name": "anomaly_details", "dataType": ["text"]},
                {"name": "remediation", "dataType": ["text"]},
                {"name": "recommendation", "dataType": ["text"]}
            ]
        }

        existing_collections = weaviate_client.collections.list_all(simple=False)

        if collection_name in existing_collections:
            logger.info("Weaviate schema contains NetworkAnomaly collection. Hence, deleting it!")
            weaviate_client.collections.delete(collection_name)
            
        schema = weaviate_client.collections.create_from_dict(class_schema)
        logger.info("Weaviate schema created")

        # Load Data into Weaviate
        data_objects = list()

        # for i, row in df.iterrows():
        for index, row in enumerate(rag_data_list, start=1):
            logger.info(f"\nProcessing row# {index}")
            properties_1 = {
                "anomaly_name": row['anomaly_name'],
                "anomaly_details": row['anomaly_details'],
                "remediation": row['remediation'],
                "recommendation": row['recommendation']
            }
            combined_text = f"{row['anomaly_name']} {row['anomaly_details']} {row['remediation']} {row['recommendation']}"

            vector_embeddings = []

            # CX Playground
            vector_embeddings = await cx_playground_generate_embeddings(combined_text)
            data_object = wvc.data.DataObject(
                properties=properties_1, 
                vector=vector_embeddings
            )
            data_objects.append(data_object)

        response = schema.data.insert_many(data_objects)

        logger.info("Loading data into Weaviate Vector store is complete")
        return True

    except Exception as e:
        logger.error(f"Failed to update data: error {str(e)}")
        return False

    finally:
        weaviate_client.close()

and here are my environment variables.

  • WEAVIATE_SERVER=netnanny-weaviate
    - WEAVIATE_PORT=8080
    - WEAVIATE_GRPC_PORT=50051

Thank you!

You should not use this. The best practice is to specify a version, otherwise it can upgrade/downgrade your cluster without you wanting.

Is this a new cluster or are you upgrading it?

Can you try a new docker file created here?

Hi @DudaNogueira,

Thank you so much! The issue has finally been resolved.
Specifying the exact version and deleting the old data helped me fix the problem.

Thanks again!

1 Like

Hi @Jotheraj_kori !!

Glad to hear that!

Happy coding!