Hello folks,
we are trying to validate and see if Weaviate DB is a perfect match for our use case scenarios.
Our datasets are pretty big, datasets may range from 100 Million to couple of billion vectors generated in few hours. with higher dimensions >768
Hence local storage is not feasible for us at this scale.
Hence I have couple of questions that i seek help from the community.
- Can we use an S3 or MinIO type of object storage endpoints as persistent storage directory with Weaviate (like Milvus for reference) ?
(or)
- I read about EKS somewhere in the documentation . Hence could we use NFS based options for persistent storage for the k8s cluster deployments .
Which one is recommended for performance and data consistency ?
Also how Weavieate behave if we use the NFS as PERSISTENCE_DATA_PATH when i deploy multiple replicas ? How are the reads and writes are load balanced. ?
Will all the pods write to the same NFS data path. (e.x: My Persistent data path is set to β/mnt/weavieatedb/data/β) or do we need to configure, each pod should be pointed to separate data folder. ?
Pod0 ==> (β/mnt/weavieatedb/data_0/β)
Pod1 ==> (β/mnt/weavieatedb/data_1/β)
PodN. ==> (β/mnt/weavieatedb/data_n/β)
We have huge NFS share (NFS share is presented from a storage box) . Hence performance and throughput should not be a concern.
- Can we use one PVC created out of the NFS share and present it to values.yaml when deploying via helm chart
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: wv-storage
namespace: wvdb # Match the namespace
spec:
accessModes:
- ReadWriteMany
storageClassName: ""
resources:
requests:
storage: 500Gi
volumeName: wv-nfs-pv
and my Persistent volume is this:
apiVersion: v1
kind: PersistentVolume
metadata:
name: wv-nfs-pv
spec:
capacity:
storage: 500Gi
accessModes:
- ReadWriteMany
nfs:
path: /wvdb # Path on NFS server
server: mystorageserver # NFS server IP
In my values.yaml:
storage:
fullnameOverride: βwv-storageβ
size: 500Gi
storageClassName: ββ
When i deploy the helm chart . I coud only see one pod . Is Weaviate not a distributed architecture o
Every 2.0s: kubectl get pods -n wvdb sn1-r6515-h01-05: Fri Oct 11 07:36:39 2024
NAME READY STATUS RESTARTS AGE
weaviate-0 1/1 Running 0 100s