Hi, I have a python script that handles creating a collection as follows (on sandbox cluster), using voyage ai voyage-multimodal-3:
The collection gets created successfully on the sandbox and inserts are created, generating the vectorized embeddings for the text content.
# Define properties for multimodal data
properties = [
Property(name="title", data_type=DataType.TEXT, description="File generated title"),
Property(name="classification", data_type=DataType.TEXT, description="File generated classification"),
Property(name="tags", data_type=DataType.TEXT_ARRAY, description="File generated tags"),
Property(name="file_name", data_type=DataType.TEXT, description="Original file name"),
Property(name="file_type", data_type=DataType.TEXT, description="Original file MIME type"),
Property(name="content_type", data_type=DataType.TEXT, description="'text' or 'image'"),
Property(name="chunk_index", data_type=DataType.INT, description="Index of the chunk within the file"),
# Store text content directly
Property(name="text_content", data_type=DataType.TEXT, description="Text chunk content", skip_vectorization=False), # Make sure this is vectorized
# Store images as base64 encoded blobs
Property(name="image_content", data_type=DataType.BLOB, description="Base64 encoded image content", skip_vectorization=False), # Ensure this is vectorized too
]
client.collections.create(
name=collection_name,
# Configure the multimodal vectorizer using Voyage AI multimodal
vectorizer_config=[
Configure.NamedVectors.multi2vec_voyageai(
model='voyage-multimodal-3',
name="namedvector-name",
# Define the fields to be used for the vectorization - using image_fields, text_fields
image_fields=[
Multi2VecField(name="image_content", weight=0.25)
],
text_fields=[
Multi2VecField(name="text_content", weight=0.75)
],
# Configure vector index (HNSW is default and usually good)
vector_index_config=Configure.VectorIndex.hnsw(
quantizer=Configure.VectorIndex.Quantizer.bq(),
distance_metric=VectorDistances.COSINE, # Cosine is common for embeddings
filter_strategy=VectorFilterStrategy.SWEEPING # or ACORN (Available from Weaviate v1.27.0)
),
)
],
multi_tenancy_config=Configure.multi_tenancy(enabled=False), # Assuming single tenancy for now
properties=properties
)
When fetching list of objects from the collection on Nextjs frontend, just for testing, everything seems to work fine and I get a list of all chunked objects, using the following code:
const collection = weaviateClient.collections.get('collection-name');
const result = await collection.query.fetchObjects({
limit: 1,
returnProperties: ['title', 'tags', 'text_content'],
})
But when I try to query based on nearText ( also providing the named vector name), it doesn’t work! Just errors in a try catch block. I tried the following:
const result = await collection.query.nearText('any search text', {
limit: 2,
})
for (let object of result.objects) {
console.log(JSON.stringify(object.properties, null, 2));
//console.log(JSON.stringify(object.metadata?.distance, null, 2));
}
const result = await myNVCollection.query.nearText('any search text', {
targetVector: 'namedvector-name',
limit: 2,
})
for (let object of result.objects) {
console.log(JSON.stringify(object.properties, null, 2));
//console.log(JSON.stringify(object.metadata?.distance, null, 2));
}
What am I doing wrong? Why can I fetch all objects directly but can’t use nearText? Am I configuring the collection inappropriately for nearText or nearVector search? Thank you!
Server Setup Information
- Weaviate Server Version: 1.30.0
- Deployment Method: localhost connecting to Weaviate Cloud Sandbox
- Multi Node? none
- Client Language and Version: Python & Next.js (JS/TS Client v3)
- Multitenancy?: False