Hi @felizalde,
Yes, this can be done, but you can’t add new vectors to an existing collection.
You need to create a new collection (class), and copy over your old data with the existing vectors.
No, you can only query one vector space at a time.
“FYI. We are working on some ideas around searching across multiple vector spaces, but there are no timelines for this yet.”
How to migrate a collection to a named vector collection
Here is how I would do it.
Step 1 - create a new collection
Create a new collection, which contains two named vectors:
- The old vector defined as a named vector – it is pretty much the same configuration, but you need to add a name
- you don’t need
source_properties
, however if you could define source_properties
, so that you end up with the same result as with exclude, then it might be better to do that.
- The new vector – with the new configuration
from weaviate.classes.config import Configure, Property, DataType
client.collections.create(
"NewArticles",
properties=[ # Define properties
Property(name="title", data_type=DataType.TEXT),
Property(name="body", data_type=DataType.TEXT),
Property(name="description", data_type=DataType.TEXT, skip_vectorization=True),
],
vectorizer_config=[
# Set a named vector
Configure.NamedVectors.text2vec_cohere( # Use the "text2vec-cohere" vectorizer
name="old_vector_name"
# source_properties=["title", "body"] # use this if that covers the same number of properties
),
# Set another named vector
Configure.NamedVectors.text2vec_openai( # Use the "text2vec-openai" vectorizer
name="new_vector_name",
source_properties=["title", "body"]
)
],
)
Step 2 – copy data over
Use the iterator to get properties and vectors from the old collection:
old_articles = client.collections.get("Articles")
for item in collection.iterator():
print(
item.uuid,
item.properties,
item.vector["default"] # this might also be just item.vector
)
Then iterate over the objects and using batch insert the data with the existing vectors (and let Weaviate generate the new vectors for you).
please note, I didn’t test this code. I would recommend testing this on a small group of objects (~100), to see if that works for you.
old_articles = client.collections.get("Articles")
new_articles = client.collections.get("NewArticles")
with new_articles.batch.dynamic() as batch:
for item in old_articles.iterator(include_vector=True):
batch.add_object(
uuid=item.id,
properties=item.properties,
vector={
"old_vector_name": item.vectors["default"], # use the existing vector
# no need to specify the new vector here, as that will get generated
)
Considerations
Note 1 - double vectors → double RAM
Please note, that using multiple vector spaces, means more RAM needed for the new vectors
Note 2 - generated vectors
When you copy over existing objects, the old vectors won’t be regenerated, only new ones will.
But, after that, every time you add a new objects, a vector will be generated for each named vector.
I hope this helps.