Start weaviate embeded with openai token and persistent storage

client = weaviate.connect_to_embedded( headers = {"X-OpenAI-Api-Key": OPENAI_API_KEY})
    client = weaviate.WeaviateClient(
        embedded_options=EmbeddedOptions(
            additional_env_vars={
                "ENABLE_MODULES": "backup-filesystem,text2vec-openai,text2vec-cohere,text2vec-huggingface,ref2vec-centroid,generative-openai,qna-openai",
                "BACKUP_FILESYSTEM_PATH": "/tmp/backups",
                "persistence_data_path":"./weaviate_data"
            }
        )
        # Add additional options here (see Python client docs for syntax)
    )

seems this code doesn’t work; but how to initiate embeded client with persistent local storage?

Hi!

Here is how you can run Weaviate Embedded using a local folder.

Ps: I was not able to use the filesystem module while using Embedded :thinking:

first, let’s create our directories:

mkdir -p /tmp/duda/data
cd /tmp/duda
echo 'import weaviate
from datetime import datetime

client = weaviate.connect_to_embedded(
    version="latest",
    persistence_data_path="./data",
    environment_variables={
        "ENABLE_MODULES": "text2vec-openai,text2vec-cohere,text2vec-huggingface,ref2vec-centroid,generative-openai,qna-openai",
    }
)
# let create a collection
if not client.collections.exists("Test"):
    collection = client.collections.create("Test")
else:
    collection = client.collections.get("Test")


# Get current timestamp
timestamp = datetime.now()

# Format as a string
time_id = timestamp.strftime("%Y%m%d%H%M%S%f")
print("TIMESTAMP", time_id)
# lets insert an object
collection.data.insert({"text": "this is a test " + time_id})
client.close()
' > app.py
python3 app.py
find .

This should be the output:

{“action”:“startup”,“default_vectorizer_module”:“none”,“level”:“info”,“msg”:“the default vectorizer modules is set to "none", as a result all new schema classes without an explicit vectorizer setting, will use this vectorizer”,“time”:“2024-11-02T13:08:18-03:00”}
{“action”:“startup”,“auto_schema_enabled”:true,“level”:“info”,“msg”:“auto schema enabled setting is set to "true"”,“time”:“2024-11-02T13:08:18-03:00”}
{“level”:“info”,“msg”:“No resource limits set, weaviate will use all available memory and CPU. To limit resources, set LIMIT_RESOURCES=true”,“time”:“2024-11-02T13:08:18-03:00”}
{“level”:“info”,“msg”:“open cluster service”,“servers”:{“Embedded_at_8079”:53526},“time”:“2024-11-02T13:08:18-03:00”}
{“address”:“192.168.28.127:53527”,“level”:“info”,“msg”:“starting cloud rpc server …”,“time”:“2024-11-02T13:08:18-03:00”}
{“level”:“info”,“msg”:“starting raft sub-system …”,“time”:“2024-11-02T13:08:18-03:00”}
{“address”:“192.168.28.127:53526”,“level”:“info”,“msg”:“tcp transport”,“tcpMaxPool”:3,“tcpTimeout”:10000000000,“time”:“2024-11-02T13:08:18-03:00”}
{“level”:“info”,“msg”:“loading local db”,“time”:“2024-11-02T13:08:18-03:00”}
{“level”:“info”,“msg”:“database has been successfully loaded”,“n”:0,“time”:“2024-11-02T13:08:18-03:00”}
{“level”:“info”,“metadata_only_voters”:false,“msg”:“construct a new raft node”,“name”:“Embedded_at_8079”,“time”:“2024-11-02T13:08:18-03:00”}
{“action”:“raft”,“index”:0,“level”:“info”,“msg”:“raft initial configuration”,“servers”:“[]”,“time”:“2024-11-02T13:08:18-03:00”}
{“last_snapshot_index”:0,“last_store_applied_index”:0,“last_store_log_applied_index”:0,“level”:“info”,“msg”:“raft node constructed”,“raft_applied_index”:0,“raft_last_index”:0,“time”:“2024-11-02T13:08:18-03:00”}
{“action”:“raft”,“follower”:{},“leader-address”:“”,“leader-id”:“”,“level”:“info”,“msg”:“raft entering follower state”,“time”:“2024-11-02T13:08:18-03:00”}
{“action”:“bootstrap”,“error”:“could not join a cluster from [192.168.28.127:53526]”,“level”:“warning”,“msg”:“failed to join cluster, will notify next if voter”,“servers”:[“192.168.28.127:53526”],“time”:“2024-11-02T13:08:20-03:00”,“voter”:true}
{“action”:“bootstrap”,“candidates”:[{“Suffrage”:0,“ID”:“Embedded_at_8079”,“Address”:“192.168.28.127:53526”}],“level”:“info”,“msg”:“starting cluster bootstrapping”,“time”:“2024-11-02T13:08:20-03:00”}
{“action”:“bootstrap”,“level”:“info”,“msg”:“notified peers this node is ready to join as voter”,“servers”:[“192.168.28.127:53526”],“time”:“2024-11-02T13:08:20-03:00”}
{“action”:“raft”,“last-leader-addr”:“”,“last-leader-id”:“”,“level”:“warning”,“msg”:“raft heartbeat timeout reached, starting election”,“time”:“2024-11-02T13:08:20-03:00”}
{“action”:“raft”,“level”:“info”,“msg”:“raft entering candidate state”,“node”:{},“term”:2,“time”:“2024-11-02T13:08:20-03:00”}
{“action”:“raft”,“level”:“info”,“msg”:“raft election won”,“tally”:1,“term”:2,“time”:“2024-11-02T13:08:20-03:00”}
{“action”:“raft”,“leader”:{},“level”:“info”,“msg”:“raft entering leader state”,“time”:“2024-11-02T13:08:20-03:00”}
{“level”:“warning”,“msg”:“Multiple vector spaces are present, GraphQL Explore and REST API list objects endpoint module include params has been disabled as a result.”,“time”:“2024-11-02T13:08:20-03:00”}
{“action”:“grpc_startup”,“level”:“info”,“msg”:“grpc server listening at [::]:50050”,“time”:“2024-11-02T13:08:20-03:00”}
{“address”:“192.168.28.127:53526”,“level”:“info”,“msg”:“current Leader”,“time”:“2024-11-02T13:08:20-03:00”}
{“level”:“info”,“msg”:“starting migration from old schema”,“time”:“2024-11-02T13:08:20-03:00”}
{“level”:“info”,“msg”:“legacy schema is empty, nothing to migrate”,“time”:“2024-11-02T13:08:20-03:00”}
{“level”:“info”,“msg”:“migration from the old schema has been successfully completed”,“time”:“2024-11-02T13:08:20-03:00”}
{“action”:“restapi_management”,“level”:“info”,“msg”:“Serving weaviate at http://127.0.0.1:8079”,“time”:“2024-11-02T13:08:20-03:00”}
{“level”:“warning”,“msg”:“prop len tracker file data/test/hWgfm22MHue6/proplengths does not exist, creating new tracker”,“time”:“2024-11-02T13:08:21-03:00”}
{“action”:“hnsw_prefill_cache_async”,“level”:“info”,“msg”:“not waiting for vector cache prefill, running in background”,“time”:“2024-11-02T13:08:21-03:00”,“wait_for_cache_prefill”:false}
{“level”:“info”,“msg”:“Created shard test_hWgfm22MHue6 in 4.399958ms”,“time”:“2024-11-02T13:08:21-03:00”}
{“action”:“hnsw_vector_cache_prefill”,“count”:1000,“index_id”:“main”,“level”:“info”,“limit”:1000000000000,“msg”:“prefilled vector cache”,“time”:“2024-11-02T13:08:21-03:00”,“took”:284458}
TIMESTAMP 20241102130821147464
{“action”:“restapi_management”,“level”:“info”,“msg”:“Shutting down… “,“time”:“2024-11-02T13:08:21-03:00”}
{“action”:“restapi_management”,“level”:“info”,“msg”:“Stopped serving weaviate at http://127.0.0.1:8079”,“time”:“2024-11-02T13:08:21-03:00”}
{“action”:“bootstrap”,“level”:“info”,“msg”:“node reporting ready, node has probably recovered cluster from raft config. Exiting bootstrap process”,“time”:“2024-11-02T13:08:21-03:00”}
{“action”:“telemetry_push”,“level”:“info”,“msg”:“telemetry started”,“payload”:”\u0026{MachineID:1c2a58cf-33c5-43f8-a24f-67dd7f6a9bb3 Type:INIT Version:1.25.6 NumObjects:0 OS:darwin Arch:arm64 UsedModules:}”,“time”:“2024-11-02T13:08:21-03:00”}
{“action”:“telemetry_push”,“level”:“info”,“msg”:“telemetry terminated”,“payload”:“\u0026{MachineID:1c2a58cf-33c5-43f8-a24f-67dd7f6a9bb3 Type:TERMINATE Version:1.25.6 NumObjects:0 OS:darwin Arch:arm64 UsedModules:}”,“time”:“2024-11-02T13:08:21-03:00”}
{“level”:“info”,“msg”:“closing raft FSM store …”,“time”:“2024-11-02T13:08:21-03:00”}
{“level”:“info”,“msg”:“shutting down raft sub-system …”,“time”:“2024-11-02T13:08:21-03:00”}
{“level”:“info”,“msg”:“transferring leadership to another server”,“time”:“2024-11-02T13:08:21-03:00”}
{“error”:“cannot find peer”,“level”:“error”,“msg”:“transferring leadership”,“time”:“2024-11-02T13:08:21-03:00”}
{“level”:“info”,“msg”:“closing raft-net …”,“time”:“2024-11-02T13:08:21-03:00”}
{“level”:“info”,“msg”:“closing log store …”,“time”:“2024-11-02T13:08:21-03:00”}
{“level”:“info”,“msg”:“closing data store …”,“time”:“2024-11-02T13:08:21-03:00”}
{“level”:“info”,“msg”:“closing loaded database …”,“time”:“2024-11-02T13:08:21-03:00”}
{“action”:“load_all_shards”,“level”:“error”,“msg”:“failed to load all shards: context canceled”,“time”:“2024-11-02T13:08:21-03:00”}
{“level”:“info”,“msg”:“closing raft-rpc client …”,“time”:“2024-11-02T13:08:21-03:00”}
{“level”:“info”,“msg”:“closing raft-rpc server …”,“time”:“2024-11-02T13:08:21-03:00”}
.
./app.py
./data
./data/test
./data/test/hWgfm22MHue6
./data/test/hWgfm22MHue6/main.hnsw.commitlog.d
./data/test/hWgfm22MHue6/main.hnsw.commitlog.d/1730563701
./data/test/hWgfm22MHue6/indexcount
./data/test/hWgfm22MHue6/lsm
./data/test/hWgfm22MHue6/lsm/property__id
./data/test/hWgfm22MHue6/lsm/property__id/segment-1730563701144520000.db
./data/test/hWgfm22MHue6/lsm/objects
./data/test/hWgfm22MHue6/lsm/objects/segment-1730563701143308000.db
./data/test/hWgfm22MHue6/lsm/property_text_searchable
./data/test/hWgfm22MHue6/lsm/property_text_searchable/segment-1730563701157584000.db
./data/test/hWgfm22MHue6/lsm/property_text
./data/test/hWgfm22MHue6/lsm/property_text/segment-1730563701157384000.db
./data/test/hWgfm22MHue6/version
./data/test/hWgfm22MHue6/proplengths
./data/schema.db
./data/modules.db
./data/migration1.19.filter2search.state
./data/raft
./data/raft/snapshots
./data/raft/raft.db
./data/classifications.db
./data/migration1.19.filter2search.skip.flag
./data/migration1.22.fs.hierarchy

Thanks a lot. What do you mean : “I was not able to use the filesystem module while using Embedded”? can’t upload data? or can’t specify storage? if not where did the data saved to? In memory? Thanks for your feedback.

Hi!

After setting up the backup, I got an error while triggering a backup.

All data will be stored by default in ~/.local/share/weaviate or whatever path you define in persistence_path(in the example, the folder data)

Let me know if this helps :slight_smile: