Have multiple vectors for a single object in the same index?

Description

I have records with a field ‘content’, which is a large text body. I want to divide the body into chunks, and vectorize each chunk. However, each chunk should link to the same record.

Basically, I want multiple vectors for the same record on one search index.

I saw support for multiple vector embeddings but it seems to use a pre-configured Colbert indexer.

How can I do this if I’m manually providing in embeddings myself? I saw multiple named vectors but I want it all on the same index. Plus, I already created the collection, so I didn’t want to do a whole migration / transfer.

One way I can think of is to just create multiple records that store a reference id to the original record but this seems clunky.

Server Setup Information

  • Weaviate Server Version:
  • Deployment Method:
  • Multi Node? Number of Running Nodes:
  • Client Language and Version: Python, latest
  • Multitenancy?: Yes

Any additional Information

Morning @Tejas_Sharma

Weaviate supports multiple vector embeddings per object through “named vectors” which you already put your hand on. With named vectors, you can store multiple vector embeddings for a single object and search using any of these vector spaces.

However, there are two important limitations to note:

  1. Named vectors must be defined at collection creation time - you cannot add them to an existing collection

  2. Each named vector is independent and has its own index

As you mentioned, you’ve already created your collection, so this would require a migration to a new collection with the proper configuration.

Best regards,
Mohamed Shahin
Weaviate Support Engineer
(Ireland, GMT/UTC timezone)

Hey Mohamed,

Thanks for responding. Those are exactly the problems I have with named vectors, both of them.

I need to be able to search on the same index. Is there no solution for this?

To be honest, aside from what you’ve already shared—which I agree is a bit clunky—I’m not entirely sure of a better alternative at the moment.

If I come across easier solution, I’ll definitely let you know. It might also someone from the community has tackled this differently - let’s see.

Best regards,

Mohamed Shahin
Weaviate Support Engineer
(Ireland, GMT/UTC timezone)

1 Like

Hi @Tejas_Sharma !

One possibility to explore is Ref2Vec

That way you can have your content chunks vectorized independently and providing their vectors to the record centroid.

Now you can search per record (against the record’s chunk centroid) and per individual chunk directly.

And each chunk will have a cross reference to the record.

You can maybe have named vectors for the record, and weight different vector search between the ref2vec and different other modules. :thinking:

That would also require some migration/remodeling and a lot of experimenting :slight_smile:

Happy coding!

2 Likes

Thanks @Mohamed_Shahin @DudaNogueira !

Oh wow Ref2Vec seems interesting, but it would require a migration still right… I think the chunking + separate record + with a metadata property foreignId to the original record seems the easiest.

Do you guys think there are any performance problems with this approach? Main one I see is I need to do a separate get_by_id call to fetch the latest original record using the foreignId