How to choose the correct `PERSISTENCE_HNSW_MAX_LOG_SIZE`

d_khlebokazov · May 14, 2025, 5:47am

We want to better configure our weaviate cluster to achieve best performance. Here is such configuration in weaviate environments as PERSISTENCE_HNSW_MAX_LOG_SIZE and we want to know how to configure it depending on our vector size (512).

From the official documentation we get these explanation:

Database parameters for HNSW

Note that some database-level parameters are available to configure HNSW indexing behavior.

PERSISTENCE_HNSW_MAX_LOG_SIZE is a database-level parameter that sets the maximum size of the HNSW write-ahead-log. The default value is 500MiB.

Increase this value to improve efficiency of the compaction process, but be aware that this will increase the memory usage of the database. Conversely, decreasing this value will reduce memory usage but may slow down the compaction process.

Preferably, the PERSISTENCE_HNSW_MAX_LOG_SIZE should set to a value close to the size of the HNSW graph.

So here are some questions:

What actually size of the HNSW graph means?
And how to estimate it if we have about 47mln vectors of size 512?

DudaNogueira · May 16, 2025, 4:50pm

Hi @d_khlebokazov !!

This setting, was introduced with this PR (where there is more infos).

github.com/weaviate/weaviate

Allow setting HNSW Max log size ("condensing limit") dynamically

stable/v1.24 ← set-hnsw-condensing-limit-dynamically

opened 07:59AM - 09 May 24 UTC

etiennedi

+55 -1

### What's being changed: * The condensing requires memory, so it defaults to a… fairly small value * However, on massive machines, where the memory is available, we are currently not making use of it * As a result we leave a lot of redundancy in the HNSW logs "uncondensed" which leads to longer startup times * This PR introduces an env var `PERSISTENCE_HNSW_MAX_LOG_SIZE` to override the value * The default value of 500MiB is unchanged * The variable accepts: * unset -> use deafult * raw integers -> interpreted as bytes * SI units, e.g. `400KiB` * IEC units, e.g. `400KB` ### Review checklist - [ ] Documentation has been updated, if necessary. Link to changed documentation: - [ ] Chaos pipeline run or not necessary. Link to pipeline: - [x] All new code is covered by tests where it is reasonable. - [ ] Performance tests have been run or not necessary.

It helps prevent massive, uncompacted HNSW logs which can affect startup time. It was introduced to prevent slow startup times caused by many small commit logs reaching the default 512MB limit without being combined.

This setting becomes especially interesting if you not only ingest but also delete a lot of data.

This is about the disk size, and not memory So I believe the size they mention is the disk usage of a node.

Let me know if this helps!

Topic		Replies	Views
WAL's folder grows unlimittely Support technical	5	230	August 22, 2024
How to planning HNSW index ef, efConstruction and maxConnections parameters with PQ? Support technical	1	172	January 6, 2025
[Clarification on resource planning] General	1	190	March 6, 2024
Documentation - Maximum index size, disk paging Support	3	900	December 6, 2023
Constant storage drain without increasing number of vectors Support bug	21	657	July 22, 2024

How to choose the correct `PERSISTENCE_HNSW_MAX_LOG_SIZE`

Database parameters for HNSW​

Related topics

Database parameters for HNSW