Description
We migrated a collection from WCS to an on-premise server. The number of items is around ~ 5kk. The migration was obtained by copying the items from one instance to another, along with the vector embeddings. The settings of the Weaviate instance (and collection/schemas) are identical. The only difference being a minor version of the Weaviate instance (on-premise is 1.24.4, WCS is currently 1.24.1).
The benchmark queries used to test the migration show different performance between the instances. When doing a top 10 search the results are different. E.g., a specific high confidence item is found on instance A, but not on instance B. If we increase the search to top 100, the item is also found on instance B. As far as I understand this is due to the approximate KNN search under the hood.
What I am a bit surprised is that the process is not deterministic between instance A and B. I would expect identical search results with identical data. Could this be due to the minor version difference between the Weaviate instances? Or are there also some additional stochastic complexities under the hood? E.g., resource dependent, caching etc.
When limiting the search via filter (and forcing a brute force search) the results are identical. So it really seems the issue is approximate KNN.
Thank you for the help.