Using sentence_transformers together with Weaviate

Hi!

I am trying to use a local setup of Weaviate without a specified vectorizer as I would prefer to generate my own embeddings. I am using the v4 Python client and running into an issue. Here is my reproducible example:

import weaviate
from sentence_transformers import SentenceTransformer


client_weaviate = weaviate.connect_to_local()
print(client_weaviate.is_ready()) # True
print(client_weaviate.is_live()) # True
print(client_weaviate.is_connected()) # True

model = SentenceTransformer('all-MiniLM-L6-v2')
test_string = "test_string"
emb = model.encode([test_string])

In this code chunk I am unable to retrieve the model (I get stuck on the line with model = …). However, if I move the model chunk above the weaviate client chunk, it works as intended. I am not sure what could be the issue here and I would appreciate some help.

Below you can also see my Docker file:

version: '3.4'
services:
  weaviate:
    command:
    - --host
    - 0.0.0.0
    - --port
    - '8080'
    - --scheme
    - http
    image: cr.weaviate.io/semitechnologies/weaviate:1.24.5
    ports:
    - 8080:8080
    - 50051:50051
    volumes:
    - weaviate_data:/var/lib/weaviate
    restart: on-failure:0
    environment:
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
      DEFAULT_VECTORIZER_MODULE: 'none'
      CLUSTER_HOSTNAME: 'node1'
volumes:
  weaviate_data:

Thanks for your help!

hi @ggapac ! Welcome to our community :hugs:

That’s interesting. I will try to reproduce this later today.

Meanwhile, are you aware you can run this model on a container for itself and integrate with Weaviate?

Check here:

Thanks!

Hi @DudaNogueira, thanks for the info. I am aware of this, but for our use case we are using some of our own fine-tuned models (that also use the sentence_transformers library) and would like to keep the two separate.

Ah! Nice!

As long as you can produce a container that respond on those endpoints you can you your own transformers.

:slight_smile:

Hi @DudaNogueira ,

I have a similar requirement and I understood the resolution you shared. However, I was wondering if it would be easier to just change the model path to the fine-tuned model (given that I am fine-tuning a sentence transformer model) in the Docker Compose YAML file. This is probably not implemented as far as I can tell from looking at the codebase, but I was wondering if it would be feasible and what the pros and cons would be.

hi @Bevani !!

Welcome to our community :hugs:

There is a new approach I have just discovered the other day.

You can use tools like https://lmstudio.ai/, that will run different models both for LLM and embedding and emulate open ai interface.

With that you can point the text2vec_openai to that endpoint and it should use your custom models.

Check this thread: