Custom embeddings vs. embedding function

In Chroma, I can create an embedding function and pass it as:

    self.collection = self.client.create_collection(

Is this possible in weaviate as well? I understand that weaviate supports numerous builtin vectorizers, but hadn’t seen any examples of a client-created vectorizer.

Hi @Adam_Hughes :wave:.

You can pass in the raw vector upon update like this: Quickstart Tutorial | Weaviate - vector database

So, you could have your vectorizer function pre-generate vectors and use them at import time, or you could have the vectorizer work as a part of the import process.

1 Like

Here is a quick example, of how to use Weaviate without a vectorizer in Python:

Note #1: the example uses Python client v3 – we are about to release v4, which will change the syntax :wink:

Note #2: I haven’t tested the code, some parts are a bit made up like "grab_your_data()"

Connect to Weaviate (docker example)

import weaviate

client = weaviate.Client(
    url = "http://localhost:8080",  # Replace with your endpoint

Create collection

    "class": "Foo",
    "vectorizer": "none",
    "vectorIndexConfig": {
        "distance": "cosine" # make sure to provide the distance metric to search through your vectors

Insert object

data = grab_your_data()  # Load data

client.batch.configure(batch_size=10)  # Configure batch

with client.batch as batch:  # Configure a batch process
    for item in data:  # Loop through your data objects

        # construct the properties for your object
        properties = {
            "a": item["A_Field"], # this should correspond to your data structure
            "b": item["B_Field"], 
        vector_value=item["vector"] # grab the vector from your object

            vector=vector_value  # Your vector goes here

Vector query

response = (
    .get("Foo", ["a", "b"])
        "vector": [-0.0125526935, -0.021168863, -0.01076519, -0.02589537, -0.0070362035, 0.019870078, -0.010001986, -0.019120263, 0.00090044655, -0.017393013, 0.021302758]
    }) # your query vector goes here ^^^
1 Like

Here is a tested recipe – Jupyter Notebook that shows how to:

  1. Create a new collection without a vectorizer
  2. Insert data with vectors
  3. Perform vector search
  4. Perform near object search

Thank you for these great examples. I really can’t overstate how helpful they are!.

(tangent) My last point of confusion, conceptually, is vectorizing a search query. Imagine I had a custom vectorizer and used it to vectorize 50 wikipedia articles.

 myvectorizer.vectorize([article1, article2, ...])

Then I wanted to pass a user query like “Tell me about the Red Baron”.

I’d simply pass the user query directly into the same vectorizer, right? Then use nearVector? (psuedocode)

query_vec = myvectorizer.vectorize("Tell me about the Red Baron")

Is this the approach you’d use in my situation? Or is there more to it than just vectorizing the search query.

Hi @Adam_Hughes - sounds like you’ve got it!

The Weaviate near_text function does exactly what you describe, but in an integrated way. So your pseudocode is absolutely correct. :slight_smile:

1 Like

Is there a way to pass your own vector values for a clip like model (not using a vectorizer)? I need to pass, on insert, both the image and the text vector embeddings that I calculate offline in my GPU box…


Hi @Francesco_Gianferrar - yes, you can use near_vector queries to find objects most similar to a given vector.

Even if you did use a Weaviate vectorizer, you can still use this function - as long as the vectors are compatible, of course. :slight_smile: