Generative tasks using Together AI endpoint (and via proxy)

Hello everyone,

I’m working on a simple explanatory Jupyter Notebook for RAG that demonstrates using Weaviate as a vector database. My setup includes a straightforward connection using the Embedded Weaviate. Here’s the current setup:

client = weaviate.connect_to_embedded(
    persistence_data_path="some_path",
    environment_variables={
        "ENABLE_API_BASED_MODULES": "true",
        "ENABLE_MODULES": 'text2vec-transformers',
        "TRANSFORMERS_INFERENCE_API": "http://127.0.0.1:5000/"
    }
)

I have set up my own transformers API by creating a Flask app with a /vectors endpoint, and it’s working well for embedding models. Now, I’m exploring the generative features of Weaviate, with a focus on a simple example like this one.

However, our current API is with together.ai, which has OpenAI-compatible endpoints and runs through a proxy to keep the API key hidden from users running the notebook. Unfortunately, the OpenAI integration in Weaviate requires specific models like ‘gpt-4o’. If I try to use another model name (meta-llama/Llama-3.2-3B-Instruct-Turbo), it results in an error:
image
Does anyone know if it’s possible to perform these generative tasks with our setup? I was considering a method similar to the transformers embedding. My idea is to add a route in my Flask app to mimic OpenAI API calls, but behind the scenes, use together.ai endpoints through our proxy. However, I haven’t found any documentation in Weaviate that supports such custom configuration. Running these generative models locally is not possible due to resources constraints.

Thank you all in advance.
Lucas

hi @lucas.c !!

Welcome to our community :hugs:

Here is how you can accomplish this at query level:

query = collection.generate.near_text(
    query="What is this?",
    generative_provider=weaviate.classes.generate.GenerativeConfig.ollama(
        api_endpoint="http://your-custom-endpoint",
        model="my-custom-model"
    ),
    single_prompt="Translate {text} to French",
    grouped_task="Translate the following texts to Portuguese"
)

And also at Collection level, by changing the generative config of your collection:

collection.config.update(
    generative_config=weaviate.classes.config.Configure.Generative.ollama(
        api_endpoint="http://your-custom-endpoint"
    )
)

And from the other side, you can get the payloads sent to the endpoint:

Check our docs for more on this:

Let me know if this helps!

Ah, @lucas.c ! By the way:

If you want something to better manage your llm proxy, check out https://www.litellm.ai/

It can help you here.