Hello Team
I am getting this error, when I am running weaviate backup(.create) method.
The block list may not contain more than 50,000 blocks.\nERROR CODE: BlockListTooLong\n--------------------------------------------------------------------------------\n\ufeff<?xml version="1.0" encoding="utf-8"?>BlockListTooLong
The block list may not contain more than 50,000 blocks.
This is a big dataset, and I believe this has something to do with the data block size and chunking issue.
Please let me know how I can fix this issue
Weaviate Server Version:
Deployment Method: Kubernetes
Number of Running Nodes: One node
Weaviate Version: 1.25.0
hi @sanjeev1678 !
What is the backup module you are using?
I did a search, and this error points to Azure: is it backup-azure
?
Let me know if it is Azure indeed.
Thanks!
Hello @DudaNogueira
Thanks for responding. Yes this backup-azure .
Please let me know if anything else needed.
Hi @sanjeev1678 ,
Following back this topic, I think Add environment overrides for azure blocksize and concurrency by donomii Β· Pull Request #6468 Β· weaviate/weaviate Β· GitHub should have fix this issue.
Itβs should be included in the latest release. Default block sized also changed to int64(40 * 1024 * 1024)
1 Like
Hello @Damien_Gasparina /@DudaNogueira ,
Thanks for your response.
We are deploying Weaviate using a Helm chart and would like to request the addition of the AZURE_BLOCK_SIZE
and AZURE_CONCURRENCY
environment variables in helm chart as well.
Please let me know if any further details are needed.
hi @sanjeev1678 !
Thanks for pointing it out. I have just documented those env vars:
I have just documented those variables:
main
β add-azure-backup-blocksize-and-concurrency
opened 02:01PM - 26 Mar 25 UTC
### What's being changed:
Documented the new environment variables introduced o⦠n this PR: https://github.com/weaviate/weaviate/pull/6468
`AZURE_BLOCK_SIZE`, `AZURE_CONCURRENCY` and a note about using `X-Azure-Block-Size` and `X-Azure-Concurrency` as client header parameters
### Type of change:
- [x] **Documentation** updates (non-breaking change to fix/update documentation)
- [ ] **Website** updates (non-breaking change to update main page, company pages, pricing, etc)
- [ ] **Content** updates β **blog**, **podcast** (non-breaking change to add/update content)
- [ ] **Bug fix** (non-breaking change to fixes an issue with the site)
- [ ] **Feature** or **enhancements** (non-breaking change to add functionality)
### How Has This Been Tested?
- [ ] **GitHub action** β automated build completed without errors
- [x] **Local build** - the site works as expected when running `yarn start`
> note, you can run `yarn verify-links` to test site links locally
Regarding our helm, you can pass those variables in values.yaml here:
# - api-key-user-readOnly
query_defaults:
limit: 100
debug: false
# Insert any custom environment variables or envSecrets by putting the exact name
# and desired value into the settings below. Any env name passed will be automatically
# set for the statefulSet.
env:
CLUSTER_GOSSIP_BIND_PORT: 7000
CLUSTER_DATA_BIND_PORT: 7001
# Set RAFT cluster expected number of voter nodes at bootstrap.
# By default helm automatically sets this value based on the cluster size.
# RAFT_BOOTSTRAP_EXPECT: 1
# Set RAFT cluster bootstrap timeout (in seconds), default is 600 (seconds)
# which should be sufficient for most of the deployments.
RAFT_BOOTSTRAP_TIMEOUT: 600
Let me know if this helps!