Named Vector migration guide

Hello everyone.

Description

I have a legacy class where I don’t use named vectors, but just use the skip_vectorization property to define what properties to use.

I now want to add new vector spaces to it.

Can I somehow migrate the default / original vector as a named vector and add new ones to it?
Do I have to create a new class and migrate the data from class 1 to class 2?
Can I even add named vectors to an existing class?

Can I query multiple named vectors in one call?

The documentation on “Update a collection definition” just says: Some definitions cannot be modified after you create your collection.

Server Setup Information

  • Weaviate Server Version: 1.24.8
  • Deployment Method: docker and k8 on azure
  • Multi Node? Number of Running Nodes: single node
  • Client Language and Version: python 4.5.4

Hi @Frederic_Abraham !!

I am not sure this is possible.

I will take a look and get back here!

Hi @felizalde,

Yes, this can be done, but you can’t add new vectors to an existing collection.
You need to create a new collection (class), and copy over your old data with the existing vectors.

No, you can only query one vector space at a time.

“FYI. We are working on some ideas around searching across multiple vector spaces, but there are no timelines for this yet.”

How to migrate a collection to a named vector collection

Here is how I would do it.

Step 1 - create a new collection

Create a new collection, which contains two named vectors:

  • The old vector defined as a named vector – it is pretty much the same configuration, but you need to add a name
    • you don’t need source_properties, however if you could define source_properties, so that you end up with the same result as with exclude, then it might be better to do that.
  • The new vector – with the new configuration
from weaviate.classes.config import Configure, Property, DataType

client.collections.create(
    "NewArticles",
    properties=[  # Define properties
        Property(name="title", data_type=DataType.TEXT),
        Property(name="body", data_type=DataType.TEXT),
        Property(name="description", data_type=DataType.TEXT, skip_vectorization=True),
    ],
    vectorizer_config=[
        # Set a named vector
        Configure.NamedVectors.text2vec_cohere(  # Use the "text2vec-cohere" vectorizer
            name="old_vector_name"
            # source_properties=["title", "body"] # use this if that covers the same number of properties
        ),
        # Set another named vector
        Configure.NamedVectors.text2vec_openai(  # Use the "text2vec-openai" vectorizer
            name="new_vector_name", 
            source_properties=["title", "body"]
        )
    ],
)

Step 2 – copy data over

Use the iterator to get properties and vectors from the old collection:

old_articles = client.collections.get("Articles")

for item in collection.iterator():
    print(
        item.uuid,
        item.properties,
        item.vector["default"] # this might also be just item.vector
    )

Then iterate over the objects and using batch insert the data with the existing vectors (and let Weaviate generate the new vectors for you).

please note, I didn’t test this code. I would recommend testing this on a small group of objects (~100), to see if that works for you.

old_articles = client.collections.get("Articles")
new_articles = client.collections.get("NewArticles")

with new_articles.batch.dynamic() as batch:
    for item in old_articles.iterator(include_vector=True):
        batch.add_object(
            uuid=item.id,
            properties=item.properties,
            vector={
                "old_vector_name": item.vectors["default"], # use the existing vector
               # no need to specify the new vector here, as that will get generated
        )

Considerations

Note 1 - double vectors → double RAM
Please note, that using multiple vector spaces, means more RAM needed for the new vectors

Note 2 - generated vectors
When you copy over existing objects, the old vectors won’t be regenerated, only new ones will.
But, after that, every time you add a new objects, a vector will be generated for each named vector.

I hope this helps.

1 Like