Is there a way to choose the order of property vectorization using text2vec-* modules?

Based from here: Vectorizers and Rerankers | Weaviate - vector database

Unless specified otherwise in the schema, the default behavior is to:

  • Only vectorize properties that use the text data type (unless skipped)
  • Sort properties in alphabetical (a-z) order before concatenating values
  • … so on

Is there a way to sort the properties based on our pre-defined order when vectorizing?

The reason behind is that for example, we are using Huggingface multi-qa-MiniLM-L6-cos-v1 which has the following note:

Note that there is a limit of 512 word pieces: Text longer than that will be truncated. Further note that the model was just trained on input text up to 250 word pieces. It might not work well for longer text.

So we would want to prioritize the some fields before truncation in case we hit the limit. Thanks!

Hi!

There is no way to select this order. What you can do is to intentionally name your properties so they fall in the order you want.

Please, feel free to open a new feature request in our Github :slight_smile:

Thanks!

1 Like

Thanks for confirming. As suggested, I have filed a feature request here:

1 Like