Custom embeddings vs. embedding function

In Chroma, I can create an embedding function and pass it as:

    self.collection = self.client.create_collection(
        name="Foo"
        embedding_function=myembedder
    )

Is this possible in weaviate as well? I understand that weaviate supports numerous builtin vectorizers, but hadn’t seen any examples of a client-created vectorizer.

Hi @Adam_Hughes :wave:.

You can pass in the raw vector upon update like this: Quickstart Tutorial | Weaviate - vector database

So, you could have your vectorizer function pre-generate vectors and use them at import time, or you could have the vectorizer work as a part of the import process.

1 Like

Here is a quick example, of how to use Weaviate without a vectorizer in Python:

Note #1: the example uses Python client v3 – we are about to release v4, which will change the syntax :wink:

Note #2: I haven’t tested the code, some parts are a bit made up like "grab_your_data()"

Connect to Weaviate (docker example)

import weaviate

client = weaviate.Client(
    url = "http://localhost:8080",  # Replace with your endpoint
)

Create collection

client.schema.create_class({
    "class": "Foo",
    "vectorizer": "none",
    "vectorIndexConfig": {
        "distance": "cosine" # make sure to provide the distance metric to search through your vectors
    },
})

Insert object

data = grab_your_data()  # Load data

client.batch.configure(batch_size=10)  # Configure batch

with client.batch as batch:  # Configure a batch process
    for item in data:  # Loop through your data objects

        # construct the properties for your object
        properties = {
            "a": item["A_Field"], # this should correspond to your data structure
            "b": item["B_Field"], 
        }
        vector_value=item["vector"] # grab the vector from your object

        batch.add_data_object(
            class_name="Foo",
            data_object=properties,
            vector=vector_value  # Your vector goes here
        )

Vector query

response = (
    client.query
    .get("Foo", ["a", "b"])
    .with_near_vector({ 
        "vector": [-0.0125526935, -0.021168863, -0.01076519, -0.02589537, -0.0070362035, 0.019870078, -0.010001986, -0.019120263, 0.00090044655, -0.017393013, 0.021302758]
    }) # your query vector goes here ^^^
    .with_limit(5)
    .do()
)
1 Like

Here is a tested recipe – Jupyter Notebook that shows how to:

  1. Create a new collection without a vectorizer
  2. Insert data with vectors
  3. Perform vector search
  4. Perform near object search

Thank you for these great examples. I really can’t overstate how helpful they are!.

(tangent) My last point of confusion, conceptually, is vectorizing a search query. Imagine I had a custom vectorizer and used it to vectorize 50 wikipedia articles.

 myvectorizer.vectorize([article1, article2, ...])

Then I wanted to pass a user query like “Tell me about the Red Baron”.

I’d simply pass the user query directly into the same vectorizer, right? Then use nearVector? (psuedocode)

query_vec = myvectorizer.vectorize("Tell me about the Red Baron")
client.nearVector(query_vec)

Is this the approach you’d use in my situation? Or is there more to it than just vectorizing the search query.

Hi @Adam_Hughes - sounds like you’ve got it!

The Weaviate near_text function does exactly what you describe, but in an integrated way. So your pseudocode is absolutely correct. :slight_smile:

1 Like

Is there a way to pass your own vector values for a clip like model (not using a vectorizer)? I need to pass, on insert, both the image and the text vector embeddings that I calculate offline in my GPU box…

Thanks!

Hi @Francesco_Gianferrar - yes, you can use near_vector queries to find objects most similar to a given vector.

Even if you did use a Weaviate vectorizer, you can still use this function - as long as the vectors are compatible, of course. :slight_smile: