Creating RAG using own data vectorized in Azure

Hi All,
I’ve vectorized my data using Azure OpenAI Embeddings, but I wanted to confirm if I’m approaching the creation of collections correctly. So far, only the content itself is vectorized, while other fields are not. Does that sound right, or should the other fields be handled differently? There are 4 fields extension, filename, content, vector. I also want to use generative RAG using Azure OpenAI

Thanks in advance for any guidance!

evaluators=client.collections.create(
        name="CodeSnippet",
        vectorizer_config=None,
        properties=[
            wc.Property(name="extension", data_type=wc.DataType.TEXT),
            wc.Property(name="filename", data_type=wc.DataType.TEXT),
            wc.Property(name="content", data_type=wc.DataType.TEXT),
        ],
    )

Server Setup Information

  • Weaviate Server Version: 1.27.0-alpha
  • Deployment Method: ```
    semitechnologies/weaviate:latest
- Multi Node? Number of Running Nodes: 1
- Client Language and Version:
- Multitenancy?: 

### Any additional Information
<!-- logs, additional setup information, anything extra you did in the setup or variables not included in any guide you followed -->

hi @Nagendra_Dattatreya !!!

Are you vectorizing your data yourself or want to let Weaviate vectorize it for you?

Because your ar passing vectorizer_config=None, Weaviate will not vectorize your data. This is what we call Bring your own vectors

Now, regarding the properties to be vectorized, it will depend if that information adds meaning to your object and is relevant to your use case.

So removing unnecessary properties can help you have better vectors.

I have already vectorized the data (content column) using Azure OpenAI Embeddings. If I understand correctly, what you are saying is just add the property that is vectorized, is that correct?

Another question is if I let Weaviate vectorize the data, I am assuming I have to say which columns need to be vectorized. I also want to query the data based on user input

Hi!

That’s right.

If you have vectorized already the content value, you can just pass that value to that content property, along with the vector.

On that case, Weaviate will not vectorize it for you, and will build the vector index and the inverted index, that will allow you search using bm25 and hybrid.

You can create the collection, specify the vectorizer, and move your data in. When you pass the vector, the vectorization step is not triggered, but when you don’t pass, Weavaite will do it for you.

And yes, you need to specify which properties will be part of the vectorization.

If using named vectors, you can specify at the vectorizer config.

If you only have one vectorizer, not named, you can define it at the property level.

Let me know if this helps :slight_smile: