Changing Distance Metric for a Collection

Description

I’ve been told by GPT-4o that to change the distance metric for a collection, you have to back up all your data and metadata, delete the old schema and data, implement the new schema with the different distance metric, and reimport the data. Is this blasphemy true?? GPT-4o seems to be up to speed on v3 API mainly, so maybe that is the cause of the misinformation? - J

Server Setup Information

  • Weaviate Server Version:
  • Deployment Method:
  • Multi Node? Number of Running Nodes:
  • Client Language and Version:
  • Multitenancy?:

Any additional Information

hi @blue-j !!

Welcome to our community :hugs:

Indeed, this is true. :grimacing:

There are some collection properties that are mutable, but some are not.

We have a list of this mutability here.

The reasoning behind this is that there is a lot of computation that goes behind while ingesting and building the index using the distance metric.

Changing the distance metric means that all those calculation will get “lost” and need to be done again.

We are working on implementing a way to reindex your data, that will allow some of those options to be mutable. For large datasets that will mean a huge increase on resource usage, so we are figuring out the best way to implement that.

Migrating your data to a new collection on a different cluster or event a second collection at the same cluster is fairly easy. Here is a guide on how to do that: Migrate data | Weaviate

We understand that someone coming from a “regular” database this is a “blasphemy” (hahaha) but you need to understand that a Vector database will not only store the data and create some inverted index. There is a lot of other computation going on.

Let me know if this helps!

Thanks for the warm welcome! I totally understand. This is tough stuff! And thank you for the guidance. : )

  • J
1 Like