Description
I’m currently trying to store images in weaviate using the multi2vec-clip
module without actually storing the blob images, I want to keep them in S3 and just use weaviate for indexing and searching.
I’ve successfully achieved this by calling the multi2vec-clip
container to manually vectorize each image and store it in weaviate using the withProperties
+ withVector
operators.
However, I had to define at least one textField
in my schema, otherwise I wouldn’t be able to use this module. I still want to use the module because I don’t want to manually vectorize my prompts and without this module I can’t use the nearText
or nearImage
query operators.
I configured the description
as a textField
which contains a detailed generated description of each image.
My question is, how is this field being used, if it is at all?
When I do a query using nearText
, does it vectorize the prompt and compare against the vector or is it using somehow the description
field, like a combination of both vector
+ description
?
Is there a better way to achieve this: manually generating the image vector at import time, but use the module vectorizer for the query prompt?
Should I be using a different module instead?
Server Setup Information
- Weaviate Server Version: 1.14.1
- Deployment Method: local docker
- Multi Node? Number of Running Nodes: 1
- Client Language and Version: Javascript/Typescript
Any additional Information
docker-compose.yml
version: '3.4'
services:
weaviate:
image: docker.io/semitechnologies/weaviate:1.14.1
restart: on-failure:0
ports:
- "8080:8080"
environment:
LOG_LEVEL: "debug"
QUERY_DEFAULTS_LIMIT: 20
AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
PERSISTENCE_DATA_PATH: "./data"
DEFAULT_VECTORIZER_MODULE: multi2vec-clip
CLIP_INFERENCE_API: "http://multi2vec-clip:8080"
ENABLE_MODULES: "multi2vec-clip"
multi2vec-clip:
image: semitechnologies/multi2vec-clip:sentence-transformers-clip-ViT-B-32-multilingual-v1-1.2.7
ports:
- 8081:8080
collection schema
{
"class": "StockImage",
"moduleConfig": {
"multi2vec-clip": {
"textFields": [
"description"
]
}
},
"vectorIndexType": "hnsw",
"properties": [
{
"dataType": [
"string"
],
"name": "filename"
},
{
"dataType": [
"string"
],
"name": "url"
},
{
"dataType": [
"string"
],
"name": "description"
}
]
}