We want to better configure our weaviate cluster to achieve best performance. Here is such configuration in weaviate environments as PERSISTENCE_HNSW_MAX_LOG_SIZE and we want to know how to configure it depending on our vector size (512).
From the official documentation we get these explanation:
Note that some database-level parameters are available to configure HNSW indexing behavior.
PERSISTENCE_HNSW_MAX_LOG_SIZE is a database-level parameter that sets the maximum size of the HNSW write-ahead-log. The default value is 500MiB.
Increase this value to improve efficiency of the compaction process, but be aware that this will increase the memory usage of the database. Conversely, decreasing this value will reduce memory usage but may slow down the compaction process.
Preferably, the PERSISTENCE_HNSW_MAX_LOG_SIZE should set to a value close to the size of the HNSW graph.
So here are some questions:
What actually size of the HNSW graph means?
And how to estimate it if we have about 47mln vectors of size 512?
This setting, was introduced with this PR (where there is more infos).
It helps prevent massive, uncompacted HNSW logs which can affect startup time. It was introduced to prevent slow startup times caused by many small commit logs reaching the default 512MB limit without being combined.
This setting becomes especially interesting if you not only ingest but also delete a lot of data.
This is about the disk size, and not memory So I believe the size they mention is the disk usage of a node.