I’ve been told by GPT-4o that to change the distance metric for a collection, you have to back up all your data and metadata, delete the old schema and data, implement the new schema with the different distance metric, and reimport the data. Is this blasphemy true?? GPT-4o seems to be up to speed on v3 API mainly, so maybe that is the cause of the misinformation? - J
The reasoning behind this is that there is a lot of computation that goes behind while ingesting and building the index using the distance metric.
Changing the distance metric means that all those calculation will get “lost” and need to be done again.
We are working on implementing a way to reindex your data, that will allow some of those options to be mutable. For large datasets that will mean a huge increase on resource usage, so we are figuring out the best way to implement that.
Migrating your data to a new collection on a different cluster or event a second collection at the same cluster is fairly easy. Here is a guide on how to do that: Migrate data | Weaviate
We understand that someone coming from a “regular” database this is a “blasphemy” (hahaha) but you need to understand that a Vector database will not only store the data and create some inverted index. There is a lot of other computation going on.