Migration from wcs to on-premise

Description

We migrated a collection from WCS to an on-premise server. The number of items is around ~ 5kk. The migration was obtained by copying the items from one instance to another, along with the vector embeddings. The settings of the Weaviate instance (and collection/schemas) are identical. The only difference being a minor version of the Weaviate instance (on-premise is 1.24.4, WCS is currently 1.24.1).

The benchmark queries used to test the migration show different performance between the instances. When doing a top 10 search the results are different. E.g., a specific high confidence item is found on instance A, but not on instance B. If we increase the search to top 100, the item is also found on instance B. As far as I understand this is due to the approximate KNN search under the hood.

What I am a bit surprised is that the process is not deterministic between instance A and B. I would expect identical search results with identical data. Could this be due to the minor version difference between the Weaviate instances? Or are there also some additional stochastic complexities under the hood? E.g., resource dependent, caching etc.

When limiting the search via filter (and forcing a brute force search) the results are identical. So it really seems the issue is approximate KNN.

Thank you for the help.

Hi!

Welcome to our community :hugs:

when you migrate/copy your data over, for example using this migration guide, your index gets rebuilt.

So this can lead to different results indeed. Can you check if ef and efConstruction are the same in both collections?

Thanks!

ef and efConstruction are the same in both collections.

Thanks for the reply.

1 Like