Migration from wcs to on-premise

tadejkrivec · July 26, 2024, 10:46am

Description

We migrated a collection from WCS to an on-premise server. The number of items is around ~ 5kk. The migration was obtained by copying the items from one instance to another, along with the vector embeddings. The settings of the Weaviate instance (and collection/schemas) are identical. The only difference being a minor version of the Weaviate instance (on-premise is 1.24.4, WCS is currently 1.24.1).

The benchmark queries used to test the migration show different performance between the instances. When doing a top 10 search the results are different. E.g., a specific high confidence item is found on instance A, but not on instance B. If we increase the search to top 100, the item is also found on instance B. As far as I understand this is due to the approximate KNN search under the hood.

What I am a bit surprised is that the process is not deterministic between instance A and B. I would expect identical search results with identical data. Could this be due to the minor version difference between the Weaviate instances? Or are there also some additional stochastic complexities under the hood? E.g., resource dependent, caching etc.

When limiting the search via filter (and forcing a brute force search) the results are identical. So it really seems the issue is approximate KNN.

Thank you for the help.

DudaNogueira · July 26, 2024, 2:03pm

Hi!

Welcome to our community

when you migrate/copy your data over, for example using this migration guide, your index gets rebuilt.

So this can lead to different results indeed. Can you check if ef and efConstruction are the same in both collections?

Thanks!

tadejkrivec · July 29, 2024, 7:16am

ef and efConstruction are the same in both collections.

Thanks for the reply.

Topic		Replies	Views
How to migrate data from your local Weaviate to WCS (SaaS)? Resources	0	563	May 30, 2023
Performance difference between WCS and Self-hosted Support	2	480	February 15, 2024
Replica search GET query returns different results Support	15	967	April 16, 2024
Issue with migration, missing items General	5	227	April 29, 2024
Issue with v4 Filtering (Multi-Tenancy and Cloud) Support wcs	4	393	February 8, 2024

Migration from wcs to on-premise

Description

Related topics