Description
I have records with a field ‘content’, which is a large text body. I want to divide the body into chunks, and vectorize each chunk. However, each chunk should link to the same record.
Basically, I want multiple vectors for the same record on one search index.
I saw support for multiple vector embeddings but it seems to use a pre-configured Colbert indexer.
How can I do this if I’m manually providing in embeddings myself? I saw multiple named vectors but I want it all on the same index. Plus, I already created the collection, so I didn’t want to do a whole migration / transfer.
One way I can think of is to just create multiple records that store a reference id to the original record but this seems clunky.
Server Setup Information
- Weaviate Server Version:
- Deployment Method:
- Multi Node? Number of Running Nodes:
- Client Language and Version: Python, latest
- Multitenancy?: Yes
Any additional Information
Morning @Tejas_Sharma
Weaviate supports multiple vector embeddings per object through “named vectors” which you already put your hand on. With named vectors, you can store multiple vector embeddings for a single object and search using any of these vector spaces.
However, there are two important limitations to note:
-
Named vectors must be defined at collection creation time - you cannot add them to an existing collection
-
Each named vector is independent and has its own index
As you mentioned, you’ve already created your collection, so this would require a migration to a new collection with the proper configuration.
Best regards,
Mohamed Shahin
Weaviate Support Engineer
(Ireland, GMT/UTC timezone)
Hey Mohamed,
Thanks for responding. Those are exactly the problems I have with named vectors, both of them.
I need to be able to search on the same index. Is there no solution for this?
To be honest, aside from what you’ve already shared—which I agree is a bit clunky—I’m not entirely sure of a better alternative at the moment.
If I come across easier solution, I’ll definitely let you know. It might also someone from the community has tackled this differently - let’s see.
Best regards,
Mohamed Shahin
Weaviate Support Engineer
(Ireland, GMT/UTC timezone)
1 Like
Hi @Tejas_Sharma !
One possibility to explore is Ref2Vec
That way you can have your content chunks vectorized independently and providing their vectors to the record centroid.
Now you can search per record (against the record’s chunk centroid) and per individual chunk directly.
And each chunk will have a cross reference to the record.
You can maybe have named vectors for the record, and weight different vector search between the ref2vec and different other modules. 
That would also require some migration/remodeling and a lot of experimenting 
Happy coding!
2 Likes
Thanks @Mohamed_Shahin @DudaNogueira !
Oh wow Ref2Vec seems interesting, but it would require a migration still right… I think the chunking + separate record + with a metadata property foreignId
to the original record seems the easiest.
Do you guys think there are any performance problems with this approach? Main one I see is I need to do a separate get_by_id call to fetch the latest original record using the foreignId