Hi,
We have seen a bug that was affecting some 1.21.0-1.21.4
users with BM25 searches. This was fixed in 1.21.5
, so we recommend upgrading to this, especially if you are a 1.21.0-1.21.4
user.
Release notes:
Explanation of the bug & fix:
weaviate:stable/v1.21
← weaviate:fix-bm25-compaction-bug
opened 03:48AM - 30 Sep 23 UTC
## What's being changed:
fixes #3517
### What was #3517 about?
After a … while, BM25 would stop returning correct results. Most notably, for a given term or collection of terms we would see no more results.
### When did this bug first occur?
The bug first appeared in v1.21.0. However, the bug itself was already present for much longer, it was mainly hidden by another aspect which was changed in v1.21.0
### What was the root cause?
The compaction process iterates over two existing segments and merges the contents into a new segment. This removes redundancies and generally leads to better efficiency. To iterate over existing segments, cursors are used. To improve the allocation efficiency of the process itself, cursors reuse memory wherever they can.
After each iteration, some metadata is stored in a list. The metadata is used later to build an index for each segment. The index is a binary tree that stores the offsets of each row in the segment payload.
When this list references the same memory each iteration previous entries to the list get corrupted each time the cursor is advanced.
### How was this fixed
Prior to appending to the list, the buffer that holds the key is now copied.
### Why did this only appear in v1.21 – This sounds like a fairly old bug.
This is correct, the bug itself is not new. However it was hidden in previous versions. The cursor implementation accidentally provided more safety than it claimed in previous versions. It only used a reusable buffer for the values, but the key was a new assignment each iteration. As part of v1.21, the cursor needed to be changed slightly and for the first time it started doing exactly what its description claimed: It reused memory everywhere, including for the keys.
### Why was this not caught in the test suite?
The compaction test verified the correctness of the final segment(s) by iterating over them using a cursor and validating that all keys in the payload are present and that their values match the expected values. However, to iterate over _all_ objects, no index is needed. The index is only used for direct access to a specific key. Therefore, this test actually reproduced the corrupted index, but the test was designed to not need the index to function. A new verification step has now been added that verifies the correctness of each key in the segment using direct access. When removing the fix, this new test immediately fails.
## Review checklist
- [ ] Documentation has been updated, if necessary. Link to changed documentation:
- [ ] Chaos pipeline run or not necessary. Link to pipeline:
- [x] All new code is covered by tests where it is reasonable.
- [ ] Performance tests have been run or not necessary.
If you experienced issues with BM25 with Weaviate 1.21.0
–1.21.4
, upgrade to Weaviate 1.21.5
or newer.