Is this possible in weaviate as well? I understand that weaviate supports numerous builtin vectorizers, but hadn’t seen any examples of a client-created vectorizer.
So, you could have your vectorizer function pre-generate vectors and use them at import time, or you could have the vectorizer work as a part of the import process.
Here is a quick example, of how to use Weaviate without a vectorizer in Python:
Note #1: the example uses Python client v3 – we are about to release v4, which will change the syntax
Note #2: I haven’t tested the code, some parts are a bit made up like "grab_your_data()"
Connect to Weaviate (docker example)
import weaviate
client = weaviate.Client(
url = "http://localhost:8080", # Replace with your endpoint
)
Create collection
client.schema.create_class({
"class": "Foo",
"vectorizer": "none",
"vectorIndexConfig": {
"distance": "cosine" # make sure to provide the distance metric to search through your vectors
},
})
Insert object
data = grab_your_data() # Load data
client.batch.configure(batch_size=10) # Configure batch
with client.batch as batch: # Configure a batch process
for item in data: # Loop through your data objects
# construct the properties for your object
properties = {
"a": item["A_Field"], # this should correspond to your data structure
"b": item["B_Field"],
}
vector_value=item["vector"] # grab the vector from your object
batch.add_data_object(
class_name="Foo",
data_object=properties,
vector=vector_value # Your vector goes here
)
Thank you for these great examples. I really can’t overstate how helpful they are!.
(tangent) My last point of confusion, conceptually, is vectorizing a search query. Imagine I had a custom vectorizer and used it to vectorize 50 wikipedia articles.
myvectorizer.vectorize([article1, article2, ...])
Then I wanted to pass a user query like “Tell me about the Red Baron”.
I’d simply pass the user query directly into the same vectorizer, right? Then use nearVector? (psuedocode)
query_vec = myvectorizer.vectorize("Tell me about the Red Baron")
client.nearVector(query_vec)
Is this the approach you’d use in my situation? Or is there more to it than just vectorizing the search query.
Is there a way to pass your own vector values for a clip like model (not using a vectorizer)? I need to pass, on insert, both the image and the text vector embeddings that I calculate offline in my GPU box…